Data Engineering Capabilities and Personas
In this article, we will look at key data engineering capabilities at the intersection of several archetypes and personas.
In a previous article, we looked into the role of a data engineer and the general responsibilities associated with it. However, the field of data engineering has changed dramatically over the last decade. This led to the emergence of different constellations of data teams. Depending on the company size and maturity level, more specialized personas appeared requiring certain set of capabilities. In today’s article, we will breakdown these types and the expected capabilities.
So let’s get started!
Contents
Software Engineering at Heart
There is a lot of debate and people comparing the role of data engineer with traditional software engineering. Some would argue that a data engineer is just a specialized software engineer while others would question the complexity and required technical skills in data engineering. One thing we can take out from this though is that the lines between the roles are blurring.
A data engineer should be able to build, maintain and test the software architecture for managing different complexities of data. This includes understanding the principles, patterns and practices of writing clean code that is easy to evolve, test and get into production. A solid understanding of distributed systems and microservices architecture through the lenses of Application Programming Interfaces (APIs) in order to implement a secure, scalable and performant solution.
Data Platform Engineering
Another area data teams focus on is the design and operation of the infrastructure required to run different types of data workloads. This includes knowing the tradeoffs between on-premises and cloud infrastructure as well as related tools and practices such as infrastructure as code, monitoring, performance testing and optimization.
The purpose of building a data platform is to cover the end to end data lifecycle and related aspects including:
- Data pipelines:
- Ability to build, deploy, and orchestrate data pipelines and the different technology options to implement them. Examples include Extract-Transform-Load (ETL), Extract-Load-Transform (ELT), Change Data Capture (CDC), and batch vs. streaming pipelines.
- Building data pipelines is not just about moving data from one system to another. It involves continuously evaluating, monitoring and improving the quality of your data over time. Common data quality aspects include completeness, timeliness, accuracy, integrity, and consistency.
- Data Modeling:
- Ability to model data in different types of databases according to the data architecture and business needs. This includes RDBMS, data warehouses, key-value stores, document stores, graph databases, distributed file systems and columnar data stores.
- Data Storage:
- Ability to understand and choose different platforms and technology options to store data. This includes different types of databases, data lake, data warehouse, and data serialization formats.
- Data Governance:
- Data governance includes a company-wide principles, practices and organizational structures. It involves the ability to understand, design, and apply security controls around the sharing and using of data across the enterprise. Encompassing aspects around authorization, encryption, information security, compliance and regulatory needs. In addition, familiarity with the elements of data privacy and ethics such as bias, are crucial in order to detect and mitigate the anticipated threats, vulnerabilities and unintended consequences that can arise when using data.
Analytics Engineering
One more area to cover is analytics engineering where data engineering teams focus on extracting insights and knowledge from the processed data at the end of the data lifecycle. In addition to understanding multidimensional modeling and data warehousing technologies, it involves the ability to derive insights and actionable knowledge delivering clear reports, dashboards and KPIs containing compelling and effective visualizations to inform stakeholders and to support business decision-making.
With this we have reached the end of this post, I hope you enjoyed it!
Let me know what other teams should we include?
If you have any remarks or questions, please don’t hesitate and do drop a comment below.
Stay tuned!
Recap
In this article, we discussed core capabilities required across different personas and team structures. Understanding these constellations helps organizations build effective teams and deliver value.
Happy learning!