Python Data Engineer

Project Description

In this role you will design, build, and maintain data pipelines, enabling seamless data integration and analysis, while collaborating with cross-functional teams to support data-driven decision-making.

Required Skills

Python (5)

Job Description

As a Python Data Engineer, you will play a pivotal role in designing, developing, and maintaining scalable data pipelines that contribute to our data-driven culture

Job Responsibilities:

- Develop robust ETL pipelines for data extraction, transformation, and loading into our data lake/warehouse.
- Build APIs that are performant and easy to integrate with
- Integrate diverse data sources, ensuring quality and consistency across systems.
- Optimize pipelines for improved data performance.
- Implement data models for advanced analytics,reporting, and machine learning.
- Enforce data quality, validation, and governance processes.
- Leverage cloud platforms (AWS, Azure, GCP) for scalable data infrastructure.
- Monitor pipelines, troubleshoot issues, and enhance system resilience.
- Ensure data security and compliance with regulations.
- Collaborate with teams to develop data-driven solutions.
- Create detailed technical documentation.
- Stay updated on data engineering trends and tech advancements.

Qualifications

Who we're looking for:

- Proven experience as a Data Engineer, with at least 6 years of experience.
- Strong proficiency in Python programming, and experience with data manipulation libraries such as Pandas and NumPy.
- Extensive experience with building and optimizing data pipelines using orchestration systems (e.g., Apache Airflow, Dagster, Airbyte, dbt Cloud) or custom Python scripts.
- Expertise using both relational and NoSQL databases, with hands-on experience in data modeling and database design.
- Experience provisioning and managing infrastructure on cloud platforms such as AWS, Azure, or GCP.
- Familiarity with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake) and distributed computing frameworks (e.g., Hadoop, Spark).
- Fluency in SQL with the ability to optimize complex queries for performance.
- Strong problem-solving and analytical skills, with the ability to work in a fast-paced and dynamic environment.
- Excellent communication skills and ability to collaborate effectively with cross-functional teams.
- Experience with version control systems (e.g., Git) and CI/CD pipelines.
- Familiarity with data science/statistical analysis concepts, machine learning, and/or large language models are all highly valued bonuses.