I spoke with an expert panel about the advantages of data mesh, which is a decentralized data architecture that organizes data by specific business domain. Among other advantages, data mesh allows self-service data access across an organization – along with governance – which can enable significant competitive advantage.
The panelists:
- Ana Matei, who leads the Customer Solutions Architecture space for Capital One Software.
- Vishal Shah, Data Architect Manager at Pitney Bowes.
See transcript, podcast and video below.
Listen to the podcast:
Also available on Apple Podcasts
Watch the video:
This transcript has been edited for length and clarity.
eWeek: Why are companies moving toward data mesh architecture?
Ana Matei: I think there’s a lot to unpack here. I would like to start by setting the stage with what data mesh is at its core. In the most simple terms, it is an architectural concept, but also a paradigm shift that distributes the handling of data in an organization to the individual lines of business or domains or individual teams.
It is a new approach to managing and distributing data within large organizations. In that sense, data mesh departs from the traditional approach of centralizing data responsibilities under one large data team and instead allows companies to access and analyze data at scale.
So the core idea here is decentralizing data ownership and management across the organization by treating that data as a product.
Data mesh, in my opinion, has emerged as an important framework to help companies scale in a well-managed cloud data ecosystem, in a complex data environment in which volumes and sources of data are growing exponentially by the day.
There are four main principles that support the data mesh architecture that I would like to run through at a super high level:
Data as a product: So think of this as data teams applying product thinking to their data sets. In other words, an organization can assign a product owner to a data and apply the same rigorous product principles to data assets to provide real value to its end consumers.
These data products should be developed, versioned and managed as a software would be.
Data ownership: This translates into data ownership being federated among domain experts who are responsible for producing assets for analysis and business intelligence.
Self-service data platforms principle: This would essentially be a platform that would handle the underlying infrastructure needs of processing data while providing the tools necessary for a domain specific autonomy.
Federated computational governance principle of data mesh: This essentially translates into a universal set of centrally defined standards that ensure conformity to data quality, security policy compliance across all these different data domains and data owners.
eWeek: Three reasons why companies are moving toward a data mesh architecture.
Ana Matei: In no particular order, I would say scalability and agility, as the domain teams within this construct can respond quickly to changing business requirements without centralized data teams for every related task, which typically adds a lot of delays in bottlenecks.
Another main reason is reduced data silos, which has historically plagued teams in traditional architectures. So by promoting data as a product approach, each domain can take ownership of the data and make it available for consumption by other teams.
The third reason is the empowerment of domain experts to manage all of the products in their business units independently without having to rely on centralized data engineering teams.
Vishal Shah: Complimenting what Anna said, even on a very basic level for people to understand data mesh, a lot of people are moving to the cloud and normally the journey to the cloud is centralized – all the data is in one place so that people can use it.
But very soon they realize that centralizing the data in one place means that there’s one team which is managing all the data. So when those teams are contacted for various kinds of analytic solutions, operational solutions, there’s resource constraint.
That team alone cannot do everything. That team does not know how the data is being used. That team does not know what is the business that this data is going to be used for. So when data mesh came in, the whole purpose was to basically take that ownership from one central team doing all the data related activity and create a cross-functional team, which will consist of the actual data staff.
There’ll be a BI person, who is a business analyst, there would be a dashboarding person, who actually takes the data and creates dashboard. There’ll be data owners, so people who are actually generating the data.
And what this cross-function team will then do is they’ll come together and come up with a data set or data product, as Ana mentioned, to make it available for people to use.
And the whole beauty of that is that any person in the team now knows what the data is, where it is coming from, how it is being used, and what is the impact on it.
So there’s knowledge sharing within the team and there’s also responsibility, like ownership of the whole team instead of one person on one team owning it.
So at a very basic level, that’s what data mesh provides. It gives you the control of what people are consuming, the control on how you will publish something and how people consume it and what is the impact it’s going to create.
And the data governance portion that Ana mentioned: that assists this whole piece by ensuring that there’s full audit control on what is being generated and what has been published by documenting it. [It] sets up observability, sets up data quality metrics so that governance helps people to make sure that what you’re publishing is always audited basically, and becomes a trustworthy system.
eWeek: It seems like a big advantage of data mesh is the self-service aspect. Data mesh democratize access to the data and enables a company to move much faster.
Vishal Shah: Absolutely. That’s a very, very good point. It does help democratize the data because now when you make the data as a product…you are basically ensuring that the data which is being published is complete.
It’s fully trustworthy, it is documented and it has full audit control on it. If something goes wrong, people are automatically notified, there’s full automation. And then there’s also the self-serve piece of that data product, which makes this data product available to other people without even contacting someone – you can actually subscribe to the data product and start using it.
So that is the beauty of that self service. Definitely a big, big piece of the data mesh.
eWeek: Is data mesh an optimal solution for every company?
Ana Matei: Data mesh right now may not be the right answer for everyone. I think when considering data mesh as an approach, I would encourage companies to think about a few key aspects to make that determination.
They should understand how complex and how vast their data inventory is because data mesh is best suited for companies dealing with significant data complexity and scale. So I’m talking about organizations where the data landscape is growing rapidly, with diverse data sources, complex data models and multiple data consumers across the various domains as well as data producers.
Vishal Shah: I would like to add one more thing. It also is determined by the size of the company. If you have a very small engineering team supporting a few data sets, it does not make sense to actually create that whole architecture because you’re not going to get that kind of benefit. And then sometimes in a smaller company, having this separated out, as Ana was mentioning, people may not be always open to it because it is definitely a way to work differently than what you’re used to.
So an engineer told me that, okay, he was doing this engineering work only but now he’s told he’s no longer an engineer alone, he has to work with the analyst to create the aggregation. He has to work with the reporting process to make sure the reports come out. So he needs to know other things.
And that change, as Ana was mentioning, is very important. If people are open to that change, then it becomes easier to implement. And with a smaller company or even a startup, they don’t have that big a team to actually utilize the benefit. I’m not saying they cannot do it, it’s just that they may not get the same kind of benefit that data mesh provides.
eWeek: What are the common issues and challenges that companies encounter as they operationalize data mesh? How do you recommend addressing these challenges?
Vishal Shah: So as I said in my earlier conversation, basically you need to see first that is the data that you are publishing. You have lots of data that multiple teams want to use it. That is the first criteria you take: this data is going to be used by my five different teams and these are kind of solutions they’re building on top of it.
Once you know what that use case is on that data and what is the business impact of that data, that’s where you can decide whether that’s where you can decide how to create that domain centric data.
Now the initial challenges that they might face is basically, first of all, have a cross-functional team across different business unit. Because a lot of times in very large companies, different business unit don’t talk to each other as a small company.
Ana Matei: I’m going to choose the three key challenges that I think are most prevalent.
So one is around data governance and standardization. It’s also something we have learned through our journey, at Capital One, and I’ve heard other companies talk about it as well. Which is how to basically ensure that there is consistency across the different domains for how governance is established, it’s standardized and enforced across the board.
So to address this challenge, I would recommend companies ensure that they have a very well and clearly established governance framework that has been aligned by all the different stakeholders and teams involved through collaboration. And they also have a regular means of communication through governance committees or any other shared methodologies of communication or documentation.
Another challenge is monitoring and observability, right? So data flows have always been an issue across the board and it’s hard to build really robust data flows. And in a data mesh architecture, they can become amplified. If you have multiple data products and the domain teams are involved, which can lead to that complexity, it just keeps adding to the challenge.
So to address that, I think sometimes an investment is needed in a centralized monitoring solution that can provide that visibility and those insights into your data product usage, into your data pipeline performance, into your data quality system, health, you name it.
So I think this should help identify those bottlenecks and other challenges early enough to ensure smooth operations across the data mesh.
And finally, the data product lifecycle management. As the number of products in an organization grows, it can naturally be challenging to manage the versioning, deprecation, retirement, keeping up to date all of these different data products.
To address this challenge, I think companies would need to foster an ability for their data teams to operate in a simple fashion just like DevOps teams. I find that those types of principles also work very well with product management.
eWeek: What key takeaways do you want companies to be aware of about data mesh?
Vishal Shah: If companies have multiple teams using the data to create solutions using the same data and creating different solutions, it makes sense to use a data mesh architecture so that they have one way to look at the data.
Because what we saw, which gave us a big benefit, was people are looking at the data and creating a report. Two teams are looking at the same data and creating a report, but both were giving different result – same data, but different result.
Now, we cannot say any one of them is wrong because both of them are doing their own kind of logic on the data. So technically both of the reports are right, but to an executive they say, Okay, which number should I trust with the data mesh piece?
Because those teams are all coming together to generate that product, which is going to be eventually used on the dashboard. That product will then have a single result going out with all the information. So then it’s one single source of truth of everything. That data map allows you to get a single source of truth on your data, which is trustworthy.
Ana Matei: I think companies can take away three key things to focus on if they’re looking to implement data mesh.
First and foremost, they have to determine if data mesh is truly the right approach for their organization. And we talked about a few ways they can do that earlier in our conversation.
If they determine it’s the right approach for them, I would recommend to start early, because starting early does reduce some of the complexity and level of effort required to potentially retrofit existing products in this new paradigm, in this new concept.
And last, I would definitely recommend using a similar two-pronged approach as Capital One did, where they would build this central policy and central tooling that will then enable a federated data management. Because data mesh remains just a concept unless these organizations can provide self-service tools and automated workflows to operationalize this federated ownership of data.