Reputation:
I was just reading about Data Vault modeling and as far as I understand it, the hub does only contain keys (and the record source). So I was wondering why I should create those hub tables, only to store the record source? Wouldn't it be enough to have only Satellites and Links?
Btw: I'm looking for simple mysql tables in a data vault form to download and play with.
Upvotes: 3
Views: 2372
Reputation: 586
The hub is where the passive integration of multiple sources is applied. You would have a column for data source and record all instances of each key as it first arrives in your hub. Example, if I have a CRM system and an ERP system and I sync the data from the CRM system first, then the ERP data comes available. I would add all of the keys from the CRM system, with the data source column value of "CRM". Then when I bring in the ERP system, assuming I have the same structure of keys for the table, I would only add the new keys that only exist in the ERP system with a data-source of "ERP". If the keys are different, you would have to add all of the data from both systems. The point is that you are retaining all of the data from all of the systems in play. When you move to your next layer, be it a Business Data Vault or a Data Mart, you would apply business logic against the hub and satellites according to the "Business Rules" to get to a single resulting row for the two systems where applicable. If you employ the transformation before storing it in this intermediate state, you lose the audit-ability, and the ability to change the business rules at a later date. Make sense?
Upvotes: 3
Reputation: 8239
One of the main concepts of Data Vault modeling is the separation of Business Keys, Satellites for detail data and Links to connect Hubs.
Example
Employee
--------
Personnel Number
Name
Surname
Street
City
Department
--------
ID
Shortcode
Name
Employee Number
Imagine that one department only has one employee.
Business Keys
Now the business identifiers for the business objects Employee and Department need to be identified. This would be Personnel Number for the Employee and Shortcode for Department.
Why not ID for Department? Well, the ID is most probably a database internal ID. The shortcode is in this example something like DEP_A1613
, which is also internally used to identify the department.
Modelling
The hub for Employee consists only of the field Personnel Number and the hub for Department only of Shortcode.
This means the Hub in Data Vault modelling is for storing the business key only. Of course, Data Vault fields like Record Source, Load Date and other are needed as well. Both Hubs would also have corresponding Satellites for the describing data. It would be a violation of the Data Vault modeling technique to link Satellites together without Hubs. It wouldn't make sense, either: you need some kind of common identifier for your Satellite data which wouldn't be there if you would omit the Hub.
Conclusion
So to answer your question: You should model Hubs for business keys. Absolutely. Hubs are in fact an essential element of Data Vault modeling. Links are only connected to Hubs, not to Satellites.
Imagine a change in the Employee software. All other fields are now stored in the Employee satellite. When using a new source Employee software you could store all data in a new satellite whilst using the same Hub and business key.
Just to complete this example: the link would connect Employee and Department from Department with Employee Number.
EDIT
So for example the structure would look like this. Data Vault specific fields are marked with [DV]:
Hub Employee
------------
Employee Hash Key [DV]
Load Date [DV]
Record Source [DV]
Personnel Number
Sat Employee
------------
Employee Hash Key [DV]
Load Date [DV]
Load End Date [DV]
Record Source [DV]
Hash Diff [DV]
Name
Surname
Street
City
Link Employee Department
------------------------
Employee Department Hash Key [DV]
Employee Hash Key [DV]
Department Hash Key [DV]
Hub Department
--------------
Department Hash Key [DV]
Load Date [DV]
Record Source [DV]
Shortcode
Sat Department
--------------
Department Hash Key [DV]
Load Date [DV]
Load End Date [DV]
Record Source [DV]
Hash Diff [DV]
ID
Name
Upvotes: 3