BarneyL
BarneyL

Reputation: 1362

What is the Best Practice Approach To Handle Driving Key Relationships in Data Vault v2.0.2

We are in the process of implementing a new data store using Data Vault methodology in the Snowflake database. Our intention is to hold to the latest standards and best practices as far as we can e.g. an insert only approach and attempting to avoid various anti-patterns such as driving key relationships wherever practicable (see comments here for discussion on driving keys).

The following is a simplified example of a section of our data relating to ratings assigned to properties over time (think hotel star ratings or similar).

Central to this is a table connecting a property to a rating. The following example shows the rating history for a single property against two different schemes.

PropertyRatingID  PropertyID   RatingSchemeID RatingID   EffectiveDate
1                 1            1              1          2020-01-01
2                 1            1              2          2020-01-02
3                 1            1              1          2020-01-03
4                 1            2              3          2020-01-02
5                 1            2              4          2020-01-03

Relevant information regarding the data structure.

A timeline of the above can be pictured as follows.

Date        Scheme1Rating  Scheme2Rating
2020-01-01  1              NULL
2020-01-02  2              3
2020-01-03  1              4
2020-01-04  1              4

My initial attempt to model this was to build a hub for RatingID, a link between property and rating and a satellite attached to the link using PropertyRatingID holding all other information (primarily BrandID and EffectiveDate) to make it multi-active. This proved very difficult to use because the driving key behind changes (PropertyID and BrandID was spread between the link and satellite).

In terms of the bitemporality of the situation the focus will be on obtaining the most recently recorded set of effective dates (i.e. latest system date) to create a history of ratings over time e.g. EffectiveEndDate becomes the equivalent of LEAD(EffectiveDate) OVER(PARTITION BY PropertyID,RatingID ORDER BY EffectiveDate ASC) on the raw table. While we will not make regular use of records of values from past system times we will on occasion look at these on an ad-hoc basis to explain changes to the history between reporting periods.

A simple solution would be to join across multiple tables within the source system to flatten the data, separate it and produce a satellite per ratings schems and attach it directly to the property hub. This would give us a short term solution but would be inflexible (requiring a new hub for any new rating schemes) and still requires these satellites to be multi-active to hold the multiple effective dates current in the source system.

I believe the ideal solution requires at least one more hub relating to the driving key and potentially a second relating to an assignment of a rating to a property. Much of my reading (see previous link and this article)implies my satellite should be attached to a hub rather than a link.

What would be an effective approach be to model this using the Data Vault methodology?

I would be interested in a discussion of the payoffs of the solution proposed for example of additional "weak" hubs vs resolving driving key issues within more complex queries.

Upvotes: 1

Views: 1122

Answers (1)

Keith
Keith

Reputation: 11

This scenario, as I understand it, is more like a LINK/SAT with PropertyID, RatingSchemeID as the LINK natural key (linking to Property and RatingScheme HUB's) and RatingID in the SAT (hanging off the LINK).

Upvotes: 1

Related Questions