Stephan Eggermont
Stephan Eggermont

Reputation: 15907

When is the Data Vault model the right model for a data-warehouse?

I recently found a reference to 'Data Vault Modeling' as a model for data-warehouses. The models I've seen before are Inmon and Kimball. The author refers to possible performance problems due to the joins needed. It looks like a nice model, but I wonder about the gotcha's. Are there any experience reports on-line?

Upvotes: 3

Views: 1593

Answers (2)

Marcus D
Marcus D

Reputation: 1134

We have been using a home-grown modification to Data Vault for a number of years, called 'Link Modelling', which only has entities and links; drawing principles from neo4j, but implementing in a SQL database.

Both Link Modelling and Data Vault are very different ways of thinking to Kimball/Inmon models.

My comments below relate to a system built with the follow structure: a temporary staging database, a DWH, then a number of marts build from the DWH. There are other ways to architect a DWH solution, but this is quite typical.

With Kimball/Inmon

  • Data is cleaned on the way into the DWH, but sometimes applied on the way into the staging database
  • Business rules and MDM are (generally) applied between the staging db and the DWH
  • The marts are often subject area specific

With Data Vault/Link Modelling

  • Data is landed unchanged in staging
  • These data are passed through to the DWH also uncleaned, but stored in an entity/link form
  • Data cleansing, MDM and business rules are applied between the DWH and the marts.
  • Marts are based on subject area specific needs (same as above).
  • For us, we would often (but not always) build Kimball Star Schema style Marts, as the end users understand the data structures of these easily.

The occasions where a Link Modelled DWH comes into its own, are the following (using Kimball terminology to express the issues)

  • Upon occasion, there will be queries from the users asking 'why is a specific number having this value?'. In traditional Kimball/Inmon, data is cleansed on the way in, there is no way to know what the original value was. Link Model has the original data in the DWH.
  • When no transaction records exist that link a number of dimensions, and it is required to be able to report on the full set of data, so e.g. ask questions like 'How many insurance policies that were sold by a particular broker have no claim transactions paid?'.
  • The application of MDM in a type 2 Kimball or Inmon DWH can cause massive numbers of type 2 change records to be written to Dimensions, which often contain all the data values, so there is a lot of duplication of data. With a Link Model/Data Vault, a new dimensional value will just cause new type 2 links to be created in a link table, which only have foreign keys to entity tables. This is often overcome in Kimball DWH by having a slowly changing dimension and a fast changing dimension, which is a fair workaround.
  • In Insurance and other industries where there is the need to be able to produce 'as at date' reports, Fact tables will be slowly changing as well, type 2 dimension tracking against type 2 fact records are a nightmare.
  • From a development point of view, adding a new column to a large Kimball dimension needs to be done carefully and consideration of back-populating is important, but with a Link Model, adding an extra column to an Entity is relatively trivial.

There are always ways around these in Kimball methodology, but they require some careful thought and sometimes some jumping through hoops.

From our perspective, there is little downside to the Link Modelling.

I am not connected with any of the companies marketing/producing Kimball/Inmon or Data Vault methodologies.

Upvotes: 2

Dan Linstedt
Dan Linstedt

Reputation: 26

You can find a whole lot more information on my blog: http://danLinstedt.com, and on the forums at datavaultinstitute dot com

But to give you a quick/brief answer to your question:

The gotchas are as follows:

1) Have to accept the concept of loading raw data to the data warehouse 2) Understand that the Data Vault usually doesn't allow "end-users" direct access because of the model.

There may be a few more, but the benefits outweigh the drawbacks.

Feel free to check out the blog, it's free to register/follow.

Cheers, Dan Linstedt

Upvotes: 1

Related Questions