.. core-etl documentation master file, created by sphinx-quickstart on Thu Mar 27 23:30:51 2025. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. core-etl =============================================================================== This library provides essential components for ETL processes, offering reusable interfaces for seamless data extraction, transformation, and loading... =============================================================================== .. image:: https://img.shields.io/pypi/pyversions/core-etl.svg :target: https://pypi.org/project/core-etl/ :alt: Python Versions .. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://gitlab.com/bytecode-solutions/core/core-etl/-/blob/main/LICENSE :alt: License .. image:: https://gitlab.com/bytecode-solutions/core/core-etl/badges/release/pipeline.svg :target: https://gitlab.com/bytecode-solutions/core/core-etl/-/pipelines :alt: Pipeline Status .. image:: https://readthedocs.org/projects/core-etl/badge/?version=latest :target: https://readthedocs.org/projects/core-etl/ :alt: Docs Status .. image:: https://img.shields.io/badge/security-bandit-yellow.svg :target: https://github.com/PyCQA/bandit :alt: Security Documentation Contents ------------------------------------------------------------------------------- .. toctree:: :maxdepth: 1 :caption: Index: interfaces Features ------------------------------------------------------------------------------- **Base ETL Framework** * Template method pattern for ETL workflow orchestration * Comprehensive lifecycle hooks (pre-processing, execution, post-processing, cleanup) * Built-in error handling with detailed exception logging * Task status tracking (CREATED, EXECUTING, SUCCESS, ERROR) * Timezone support for date/datetime processing (defaults to UTC) * Temporary folder management for local file operations * Extensible resource cleanup mechanisms **File-Based ETL (IBaseEtlFromFile)** * Process files from various sources (SFTP, local filesystem, cloud storage) * Iterator-based file processing with error isolation per file * Individual file success/error callbacks for custom handling * Batch file operations with automatic error recovery * Extensible hooks: ``get_paths()``, ``process_file()``, ``on_success()``, ``on_error()`` **Record-Based ETL (IBaseEtlFromRecord)** * Process records from APIs, databases, files, message queues, and data streams * Memory-efficient batch processing, batch sizing is the subclass's responsibility via ``retrieve_records()`` * Built-in transformation pipeline: * Field removal (``attrs_to_remove``) * Field renaming (``name_mapper``) * Data type casting (``type_mapper``) * Pre and post transformation hooks for custom business logic * Extensible methods: ``retrieve_records()``, ``process_records()``, ``pre_transformations()``, ``post_transformations()`` **Async ETL (IAsyncETL)** * Concurrent record processing via asyncio producer/consumer pattern * Configurable worker pool size (``max_workers``) and queue capacity (``max_queue_size``) * Individual record failures are isolated, failed records are logged and skipped without aborting the pipeline * Extensible methods: ``produce_records()``, ``_process_record()`` * Note: ``execute()`` uses ``asyncio.run()`` internally; call ``await asyncio.to_thread(task.execute)`` from async contexts Quick Start ------------------------------------------------------------------------------- All base classes are importable directly from the top-level package: .. code-block:: python from core_etl import ( IBaseETL, IBaseEtlFromFile, IBaseEtlFromRecord, IAsyncETL, ) Installation ------------------------------------------------------------------------------- Install from PyPI using pip: .. code-block:: bash pip install core-etl uv pip install core-etl # Or using UV... pip install -e ".[dev]" # For development... Setting Up Environment ------------------------------------------------------------------------------- 1. Install required libraries: .. code-block:: bash pip install --upgrade pip pip install virtualenv 2. Create Python virtual environment: .. code-block:: bash virtualenv --python=python3.12 .venv 3. Activate the virtual environment: .. code-block:: bash source .venv/bin/activate Install packages ------------------------------------------------------------------------------- .. code-block:: bash pip install . pip install -e ".[dev]" Check tests and coverage ------------------------------------------------------------------------------- .. code-block:: shell python manager.py run-tests python manager.py run-coverage Contributing ------------------------------------------------------------------------------- Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Write tests for new functionality 4. Ensure all tests pass: ``python manager.py run-tests`` 5. Run linting: ``pylint core_etl`` 6. Run security checks: ``bandit -r core_etl`` 7. Submit a pull request License ------------------------------------------------------------------------------- This project is licensed under the MIT License. See the LICENSE file for details. Links ------------------------------------------------------------------------------- * **Documentation:** https://core-etl.readthedocs.io/en/latest/ * **Repository:** https://gitlab.com/bytecode-solutions/core/core-etl * **Issues:** https://gitlab.com/bytecode-solutions/core/core-etl/-/issues * **Changelog:** https://gitlab.com/bytecode-solutions/core/core-etl/-/blob/master/CHANGELOG.md * **PyPI:** https://pypi.org/project/core-etl/ Support ------------------------------------------------------------------------------- For questions or support, please open an issue on GitLab or contact the maintainers. Authors ------------------------------------------------------------------------------- * **Alejandro Cora González** - *Initial work* - alek.cora.glez@gmail.com