Reputation: 91
I'm doing research on data federation, data fabrics, and data meshes and I've come across two terms that seem eerily similar to each other: data virtualization
and data orchestration
. There's a lot of content on Google that is available to read on both topics but the two really sounds similar. I know data virtualization really revolves around the specific technology, and data orchestration is more principle based but it sounds like they're both tackling the same issue. The issue being taking data from different sources and locations and combining them, making it ready for analysis. Apologies in advance if this isn't the right place to ask this but didn't know where else to turn.
Upvotes: 0
Views: 315
Reputation: 311
Data Virtualization is about managing the data that allows an application to retrieve and manipulate encapsulating the complexity of the technical details about the data like how the data is formatted or where it is physically located.
Whereas, Data Orchestration is more of a process on gathering siloed data from various locations across the company, organizing it into a consistent, usable format, and activating it for use by data analysis tools.
The difference between the two are likely operable on different approaches or different applications especially in data management or data handling or processing. Simply, that definition can be taken to make it simple and straightforward.
Further on, let’s expound about the differences between the two. Let’s take first on Data Virtualization. The goal of data virtualization is to create a single representation of data from multiple, disparate sources without having to copy or move the data. Data virtualization software aggregates structured and unstructured data sources for virtual viewing through a dashboard or visualization tool. The software allows metadata about the data to be discoverable, but hides the complexities associated with accessing disparate data types from different sources. It is important to note that data virtualization does not replicate data from source systems; it simply stores metadata and integration logic for viewing. Vendors who specialize in this type of software include IBM, SAP, Denodo Technologies, Oracle, TIBCO Software, Amazon, Google, Microsoft and Red Hat.
Data orchestration, on the other hand, brings automation to the process of moving data from source to storage by configuring multiple pipeline tasks into a single end-to-end process.
Data orchestration happens in three distinct phases:
The organization phase, in which data orchestration tools gather and organize data pipelines.
The transformation phase, in which various fragmented data is converted to a consistent, accessible, and usable format.
The activation phase, in which data orchestration tools deliver usable data for transformation and visualization.
While data orchestration tools might not be required for a pipeline to be considered “functional,” they’re nonetheless an essential component of the modern data stack, and serve as the connective tissue among various data warehouses.
Upvotes: 0