What is the Best Practice Approach To Handle Driving Key Relationships in Data Vault v2.0.2

Question

We are in the process of implementing a new data store using Data Vault methodology in the Snowflake database. Our intention is to hold to the latest standards and best practices as far as we can e.g. an insert only approach and attempting to avoid various anti-patterns such as driving key relationships wherever practicable (see comments here for discussion on driving keys).

The following is a simplified example of a section of our data relating to ratings assigned to properties over time (think hotel star ratings or similar).

Central to this is a table connecting a property to a rating. The following example shows the rating history for a single property against two different schemes.

PropertyRatingID  PropertyID   RatingSchemeID RatingID   EffectiveDate
1                 1            1              1          2020-01-01
2                 1            1              2          2020-01-02
3                 1            1              1          2020-01-03
4                 1            2              3          2020-01-02
5                 1            2              4          2020-01-03

Relevant information regarding the data structure.

PropertyRatingID is an identity key to ensure uniqueness and has no business meaning.
At any given time a property can only have a single rating for a single scheme but can be rated under several schemes
PropertyID,RatingSchemeID and EffectiveDate are not required to be a unique combination.
EffectiveDate does not represent the date the record was entered in to the system and can be backdated into the past by a significant duration creating a bitemporal situation between effective date and the date the change is applied.
PropertyID, RatingSchemeID and RatingID are all foreign keys to other tables providing descriptive data. Property already exists as a hub in its own right within our model.

A timeline of the above can be pictured as follows.

Date        Scheme1Rating  Scheme2Rating
2020-01-01  1              NULL
2020-01-02  2              3
2020-01-03  1              4
2020-01-04  1              4

My initial attempt to model this was to build a hub for RatingID, a link between property and rating and a satellite attached to the link using PropertyRatingID holding all other information (primarily BrandID and EffectiveDate) to make it multi-active. This proved very difficult to use because the driving key behind changes (PropertyID and BrandID was spread between the link and satellite).

In terms of the bitemporality of the situation the focus will be on obtaining the most recently recorded set of effective dates (i.e. latest system date) to create a history of ratings over time e.g. EffectiveEndDate becomes the equivalent of LEAD(EffectiveDate) OVER(PARTITION BY PropertyID,RatingID ORDER BY EffectiveDate ASC) on the raw table. While we will not make regular use of records of values from past system times we will on occasion look at these on an ad-hoc basis to explain changes to the history between reporting periods.

A simple solution would be to join across multiple tables within the source system to flatten the data, separate it and produce a satellite per ratings schems and attach it directly to the property hub. This would give us a short term solution but would be inflexible (requiring a new hub for any new rating schemes) and still requires these satellites to be multi-active to hold the multiple effective dates current in the source system.

I believe the ideal solution requires at least one more hub relating to the driving key and potentially a second relating to an assignment of a rating to a property. Much of my reading (see previous link and this article)implies my satellite should be attached to a hub rather than a link.

What would be an effective approach be to model this using the Data Vault methodology?

I would be interested in a discussion of the payoffs of the solution proposed for example of additional "weak" hubs vs resolving driving key issues within more complex queries.

What is the Best Practice Approach To Handle Driving Key Relationships in Data Vault v2.0.2

Answers (1)

Related Questions