By joining the Data Engineering team, you will be part of an early stage team who builds the data transport, collection and storage at Heetch. The team is quite new and you will have the opportunity to shape its direction while having a large impact.
You will own Heetch's data platform by architecting, building, and launching highly scalable infrastructure and reliable data pipelines that'll support our growing data processing and analytics needs. You will also create the tooling that allows users to be self sufficient, building their own pipelines and data processing.
Your efforts will allow accessibility to incredible rich insights enlightening Data Analysts, Data Scientists, Operations managers, Product Managers and many others.
WHAT YOU’LL DO
• Implement and be responsible for a scalable data processing infrastructure which will evolve to based on business and engineering needs
• Build large-scale batch data pipelines.
• Build large-scale real-time data pipelines.
• Be responsible for scaling up data processing flow to meet the rapid data growth at Heetch.
• Consistently improve and make evolve data model & data schema based on business and engineering needs.
• Implement systems tracking data quality and consistency.
• Develop tools supporting self-service data pipeline management (ETL).
• Tune jobs to improve data processing performance.
• Implement data and machine learning algorithms (A/B testing, Sessionization,).
• Work with the business, product and engineering teams to create and implement a holistic data architecture.
• At least 4+ years in Software Engineering with a focus on Data Engineering
• Extensive experience with Spark or other cluster-computing frameworks.
• Advanced SQL query competencies (queries, SQL Engine, advanced performance tuning).
• Strong skills in Python, Scala or Java
• Experience with workflow management tools (Airflow, Oozie, Azkaban, Luigi).
• Comfortable working directly with data analytics to bridge business requirements with data engineering.
• Inventive and self-started.
• Experience with Kafka.
• Experience designing and developing data warehouses
• Experience with data engineering in an AWS or cloud environment
• Performance tuning and administration of Spark, Kafka, Hive and Redshift
• Experience deploying and managing AWS infrastructure.
• Experience building data models for normalizing/standardizing varied datasets for machine learning/deep learning.
• Experience working on a remote team.
• Experience developing data engineering tools
• Paid conference attendance/travel.
• Heetch credits.
• A Spotify subscription.
• Medical care
• Code retreats and company retreats.
• Travel budget (visit your remote coworkers and our offices).
OUR ENGINEERING VALUES
- Move smart: we are data driven, and employ tools and best practices to ship code quickly and safely (continuous integration, code review, automated testing, etc).
- Distribute knowledge: we want to scale our engineering team to a point where our contributions do not stop at the company code base. We believe in the Open Source culture and communication with the outside world.
- Leave code better than you found it: because we constantly raise the bar.
- Unity makes strength: moving people from A to B is not as easy as it sounds but, we always keep calm and support each other.
- Always improve: we value personal progress and want you to look back proudly on what you’ve done.