article-data-mesh-miguel-A-padrian-pexels.jpg

Introducing Data Mesh: A New Approach to Data Management

Written by:

Charlotte Ledoux

Reading duration:4 min

2023-01-25

The data mesh approach allows teams to own and manage their own data, while still providing the benefits of a federated governance model. It is based on 4 key principles:

  • Data as a product;
  • Decentralized ownership;
  • Self-service infrastructure;
  • Federated computational governance.

The switch from centralized to decentralized data in organizations refer to the move from traditional data management approaches, such as data lakes and data warehouses, to more modern, decentralized approaches, such as data mesh. This is a shift that organizations are experiencing since a few years now. In traditional data management, data is typically stored in a central repository creating a monolith, such as a data lake or data warehouse, and accessed by various teams and systems. This can be useful for providing a single source of truth and enabling organizations to gain insights from their data. However, it can also be inflexible and difficult to scale, as teams may need to go through complex data integration processes to access the data they need. Business entities have evolved and are indeed calling for a new model as they now have data scientists or data experts and they want a faster time-to-market for their analytics and insights requirements.

article-data-mesh-data-organization-diagram.jpg Different typology of organizations to leverage data

Data mesh is based on the idea of decentralized, self-serve data access, with data treated as a first-class citizen and fully distributed across the organization. This approach allows teams to own and manage their own data, while still providing the benefits of a federated governance model.

Key principles

One of the key principles of data mesh is the concept of "domains", which are self-contained units of data related that are owned and managed by a specific team or business unit. These domains are connected through a shared data model, allowing teams to access and use data from other domains while still maintaining control over their own data. This approach allows for greater flexibility and agility, as teams can quickly access and use the data they need without needing to go through complex data integration processes.

It is based on four core principles:

  • Data as a product: data is treated as a valuable product that can be owned and managed by teams and business units, rather than a centralized function or resource. This customer centric approach expects that the analytical data provided by the domains is treated as a product, and the consumers of that data should be treated as customers.
  • Decentralized ownership: data ownership is decentralized, with teams and business units owning and managing their own data. Each domain becomes responsible for the data it is most familiar with, the domain that is the first-class user of this data or is in control of its point of origin.
  • Self-service infrastructure: domain teams must be able to independently build and maintain their own data products. A self-serve data platform must offer tooling that supports a domain data product developer’s workflow of creating, maintaining and running data products with less specialized knowledge that existing technologies assume; including access to scalable polyglot data storage, data products schema, data pipeline declaration and orchestration, data products lineage, compute and data locality, etc.
  • Federated Computational Governance: a data mesh implementation requires a governance model that embraces decentralization and domain self-sovereignty, interoperability through global standardization, a dynamic topology and most importantly automated execution of decisions by the platform. A central data governance team creates these policies and rules applicable to the organization, to achieve consistency and compliance. The domain team owns the local governance at the quantum level, maximizing the team’s expertise.

New approach, new tools?

In addition, data mesh encourages the use of modern data technologies, such as cloud-native architecture, event-driven data pipelines, and AI-powered data analytics. Indeed for a while now, the term “Modern Data Stack” has been everywhere. To put it simply, it is a set of tools hosted in the cloud that allows an organization to integrate data in a very efficient way. It is also the foundation of DataOps and MLOps. The Modern Data Stack enables the creation of clean, reliable and always available data that allows users to self-serve, thus fostering a truly data-driven culture. It is typically composed of several layers stacked on top of each other (like a cake) and each layer has its own function (ingestion, storage, transformation, analytics and governance). The Modern Data Stack configuration is modular and designed to be compatible with other components and tools (plug-and-play). This concept is compatible with the Data mesh approach, as each Modern Data Stack build by each domain will generate and expose data as a product, which can then be used by any other domain or can be pulled into the enterprise data warehouse.

Overall, data mesh represents a new way of thinking about data management that can help organizations to overcome the challenges of traditional approaches and to unlock the full potential of their data.

But you may face some obstacles, most of them are going to be on organizational level, like where to start? What are the different domains in my organization? Is there a data mesh blueprint somewhere? How to design a data mesh architecture including the microservices and APIs needed to access the data as well as the data governance and security strategy? Which data product to deploy first?

Stay tuned, as we prepare more articles on how to start a transition !

Sources

  1. Zhamak Dehghani - matinFolwer.com - Data Mesh Principles and Logical Architecture
  2. Zhamak Dehghani - oreilly.com - Data Mesh
  3. Sven Balnojan - Medium - Data Mesh Applied
  4. Juha Korpela - ellie.ai - Data Mesh: Enabling cross-domain communication with Data Models
  5. Merelda Wu - neptune.ai - What’s So Modern About the Modern Data Stack?
  6. Nagendra Nukala - Medium - Use Data Mesh pattern to stitch Modern Data Stack and specialized systems together