Data Engineer (Machine Learning)
About the role:
Abcam is looking for a Data Engineer to join Abcam’s growing Data Engineering team. The Data Engineer will be primarily responsible for productionising data science models and developing our Big Data platform to ensure it remains cutting edge and enables us build impactful AI/ML products.
About Data Engineering:
The Data Engineering team is responsible for extracting, transforming, logically organising and storing data, as well as providing Data Warehousing and Big Data capabilities to enable the creation and delivery of Analytics and AI/ML products.
Roles & responsibilities:
- Maintain and enhance our Big Data platform to ensure it meets our business needs and enables us to build impactful AI/ML products.
- Work closely with data scientists and data engineers to productionise and deploy data science models.
- Use state of the art technologies to acquire, ingest and transform large datasets.
- Design and support the development of data pipelines in line with engineering best practices.
- Curate, wrangle and prepare data to be used in data science models.
- Write performant, functional code whilst applying best practise.
- Write ETL scripts and code to make sure the ETL process performs optimally.
- Write complex procedures and functions, dynamic and procedural SQL.
- Independently design and update 3NF and Star Schema warehouses.
- Work closely with stakeholders including Data Scientists, Data Analysts, Data Architects, Infrastructure Engineers, Project Managers and Business Analysts.
- Work closely with non-technical business stakeholders and communicate complex technical concepts in a way they can understand.
- Stay abreast of developments in the world of Data Engineering and make recommendations about new technologies and ways of working.
Essential skills: AWS, Python, Git, SQL, Data Warehouse
Desirable skills: ODI, ETL, Mulesoft, Spark, Tableau, Oracle, Kafka
- You have experience in implementing Big Data solutions using one or more major cloud platform (AWS, GCP, Azure).
- You have experience in building data pipelines in production and the ability to work across structured, semi-structured and unstructured data.
- You have practical experience with machine learning methods, such as Linear Regression, Decision Trees, Random Forest, Deep Learning, etc.
- You can write clean, maintainable, and robust code in Python or similar languages.
- You are experienced in using Git and CI/CD tools for version control and code releases.
- You are skilled in SQL. Experience of Oracle SQL would be beneficial.
- You can build strong relationships with stakeholders including Data Scientists, Data Analysts, Architects, Infrastructure Engineers, Product Managers and Business Analysts.
- You are a team player who supports other team members as necessary to ensure success.
- You have experience in performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Some background in life science would be beneficial but is not essential.
If this sounds like you and you’d like to be a part of a fast paced, growing business with the vision to become the most influential company and best-loved brand in life sciences please apply now!