Critical to achieving bp’s digital ambitions is the delivery of our high value data and analytics initiatives, and the enablement of the technologies and platforms that will support those objectives.
As a Data Engineer you will be developing and maintaining data infrastructure and writing, deploying and maintaining software to build, integrate, manage, maintain, and quality-assure data at bp. You are passionate about planning and building compelling data products and services, in collaboration with business stakeholders, Data Managers, Data Scientists, Software Engineers and Architects in bp.
You will be part of bp’s Data & Analytics Platform organization, the group responsible for the platforms and services that operate bp’s big data supply chain. The portfolio covers technologies that support the life cycle of critical data products in bp, bringing together data producers and consumers through enablement and industrial scale operations of data ingestion, processing, storage and publishing, including data visualization, advanced analytics, data science and data discovery platforms. You will be part of the Data Hub team, which is the data clearing house for all of bp’s big data and analytics requirements.
For this role specifically, you will be expected to develop the necessary platform capability for our master data management and reference data workflows. You will also need to define and curate data specific to master/reference data for subsequent use and ingestion into the data hub and other consuming apps and projects. This will also involve designing and developing the mechanism to be able to integrate master/reference data across into different data orchestration workflows across the bp value chains.
- Design, implement and maintain reliable and scalable data infrastructure, including design and development of industrial scale master data management systems on Azure or AWS data platforms and services.
- Design and develop software for distributed systems, data warehouses, execute on GDPR and other privacy requirements from digital security and need to have business context and knowledge about the data domains they are working in.
- Own the end-to-end technical data lifecycle and corresponding data technology stack for their data domain and to have a deep understanding of the bp technology stack.
- Write, deploy and maintain software to build, integrate, manage, maintain, and quality-assure data, and responsible for deploying secure and well-tested software that meets privacy and compliance requirements; develops, maintains and improves CI / CD pipeline.
- Responsible for service reliability and following site-reliability engineering best practices: on-call rotations for services they maintain, responsible for defining and maintaining SLAs. Design, build, deploy and maintain infrastructure as code. Containerizes server deployments.
- Actively mentor others, contributes to or leads data engineering learning and development paths
FORMAL EDUCATION & TECHNICAL SKILLS
- Deep and hands-on experience designing, planning, implementing, maintaining and documenting reliable and scalable data infrastructure and data products in complex environments.
- Development experience in one or more object-oriented programming languages (e.g. Python, Go, Java, Scala, C++)
- Experience with Elasticsearch, Kafka, Redis, HDFS.
- Experience in DB Systems like MySQL or Postgres.
- Experience in developing master data management systems using industry standard technologies such as Semarchy, Profisee, etc
- Experience designing and implementing large-scale distributed systems
- Deep knowledge and hands-on experience in technologies across all data lifecycle stages
- BS degree in computer science or related field
- Data Engineering: Ability to build cloud data solutions and provide domain perspective on storage, big data platform services, serverless architectures, Data bricks, vendor products, RDBMS, DW/DM, NoSQL databases and security. Experience in micro-service architecture added bonus.
- Asset Data Sets: Knowledge of master/reference data in the energy industry, including experience in classification, ontology and designing workflows of such data in data platforms
- Data Manipulation: debug and maintain the end-to-end data engineering lifecycle of the data products; design and implementation of the end-to-end data stack, including designing complex data systems, e.g. interoperability across cloud platforms; experience on various types of data (streaming, structured and un-structured) is a plus.
- Software Engineering: hands-on experience with SQL and NoSQL database fundamentals, query structures and design best practices, including scalability, readability, and reliability; you are proficient in at least one object-oriented programming language, e.g. Python [specifically data manipulation packages - Pandas, seaborn, matplotlib], Apache Spark or Scala;
- Scalability, Reliability, Maintenance: proven experience in building scalable and re-usable systems that are used by others; knowledge and experience in automating operations as much as possible and identifying and building for long-term productivity over short-term speed/gains, and execute on those opportunities to improve products or services.
- Data Domain Knowledge: proven understanding of data sources and data and analytics requirements and typical SLAs associated to data provisioning and consumption at enterprise scale.
- Agile Methodology: good knowledge and understanding of modern development methodologies (Agile using Scrum and/or Kanban).