Kumari Ollu
Kumari Ollu

Reputation: 49

How we can preserve provenance and lineage in MarkLogic

How we can preserve provenance and lineage in MarkLogic?

What is the use case for the envelope pattern?

Is there any approach to track data lineage while exporting data from data sources?

Upvotes: 1

Views: 170

Answers (1)

Mads Hansen
Mads Hansen

Reputation: 66783

You might be interested in the MarkLogic Tracking Data Provenance on-demand tutorial

In episode 1 of the Data Governance series you will explore the concept of data provenance. You will learn how tracking data provenance, or the origin of data, is critical to understanding data and its lineage. You will get hands-on and learn how to achieve this goal when integrating data silos using the MarkLogic Data Hub Framework and the envelope pattern.

The concept is applied in MarkLogic Data Hub

MarkLogic Data Hub Provenance and Lineage

In MarkLogic, provenance tracks the origin of the data and lineage is the history of the data. Provenance metadata is the combined set of provenance information and lineage information tracked by MarkLogic Data Hub. Provenance information is updated with every change made to the record from ingestion through its lifetime in the MarkLogic Server.

All provenance and lineage information is stored as XML documents (using the PROV XML schema) in the data-hub-JOBS database and are added to the protected collection http://marklogic.com/provenance-services/record. When provenance and lineage records are created, triples that define the relationships among the pieces of information are also generated.

The design pattern is explained in this blog

Triple Provenance with Document Annotations Design Pattern

When building applications that leverage data from disparate sources, especially in a semantics context, it is common to want to capture provenance information, such as source and last updated time.

Using the Envelope Pattern, annotate JSON/XML serialization of triples with provenance details.

Upvotes: 1

Related Questions