- Remote, EU (Hungary)
- Engineering
- Intermediate
- Full-time
Description
We're looking for a Data Engineer to grow our team where there's room for everyone to improve their core skills and also get better at mentoring and leadership. The Data Engineer role is part of our data team; the team responsible for developing and maintaining the data pipelines that powers our products. As a Data Engineer you are responsible for developing solutions to integrate and process large datasets for our products.
What you'll be working on
- Analyze and verify that newly acquired datasets follow our specifications
- Build and maintain data pipelines to democratize data across the whole company
- Investigate, detect, monitor, and alert data quality issues
- Participate in architecture discussions and contribute to technical and architectural decision-making
- Improve your own and your fellow developers' skills through design kick-offs, pairing sessions, and code reviews
Who you'll be working with
- A small, smart, and highly capable team that builds and runs their services themselves
- A world-class QA team that understands our system better than anyone else
- A product team that makes decisions based on usability tests and usage data
The majority of our team is located in Budapest, Hungary, but you'll be able to work remotely anywhere in the EU. #LI-remote
Requirements
As a member of a small team, you must have experience in some areas to be productive from day one. Everything else is fair game and you will have the opportunity to learn on the job.
Must-haves
- 3+ years of experience with the development and maintenance of data processing pipelines
- Experience with Python
- Experience with Spark
- A strong notion of clean code and good coding practices
- A strong testing culture
Bonus points - You have experience with Python data processing libraries such as pandas
- You have experience with Airflow
- You used cloud services before (AWS, GCP, Azure)
- You used Databricks before
- You used messaging services to communicate (Kafka, RabbitMQ, JMS, etc)
- You are active on GitHub and have contributed back to open-source projects
If the role sounds interesting, apply now and get to know us during the interviews. You can read more about our hiring process on Glassdoor.
Tech Stack
At Secret Sauce, we use the technologies and tools that we believe are right for the job at the time. We're not afraid to replace a technology or rewrite a service if gaining experience and understanding the domain better makes us realize that we made the wrong choice. We embrace change and work in a fast-paced environment which means that the technology stack we work with is what we believe is the best. That makes us quite happy.
Our backend system consists of independent services built using Java and Ruby that communicate asynchronously through Kafka. We use Avro and a Schema Registry to enforce these interfaces. All our services are packaged using Docker and deployed to our infrastructure in AWS using Kubernetes. Our infrastructure is immutable, we build AMIs with Packer and roll them out with Terraform. We don't have "DevOps" or an Ops team, we think of running services in a cloud environment as part of the software engineering role.
The services we provide to our retail partners are integrated into their existing websites; we provide a single JavaScript library that they can use to unlock all of our products. Analytics, AB testing, error reporting, real-user monitoring is built-in and is available to Fit Predictor, Style Finder, and our future services. The services themselves are built using ES6, React/Flux, and modern JavaScript tooling.
Our data team loves Spark and uses it to process large datasets that we receive from our partners and that we produce ourselves. We don't run a persistent cluster; we process and move data between different data stores: S3, Kafka, PostgreSQL, and Snowflake are all part of the equation and are used where they make the most sense. We rely on Databricks to manage our Spark clusters and use Apache Airflow to orchestrate tasks and to monitor, schedule, and retry jobs.
We started out as a small development team using Ruby and Rails. We ended up with our current architecture and tech stack not because we use technology for technology's sake, but because we believe they are the right choice with the right trade-offs for our expertise, needs, and size.