Erik
Erik

Reputation: 898

Datavault modeling where source data is slightly heterogenous

I am working with a database table that can have multiple types of data within it.

Here is a trivial example (Orders can be either b2c, b2b or internal transfers):

| OrderId | B2C Customer | B2B Customer | Internal | ShippingDate | OrderTotal | ... |
|---------|--------------|--------------|----------|--------------|------------|-----|
| 123     | Foo          | -            | -        | 1/1/2023     | 100        |     |
| 456     | -            | Bar          | -        | 1/2/2023     | 200        |     |
| 789     | -            | -            | Baz      | 1/3/2023     | 300        |     |

We already have Hubs and Satellites for Vendors (b2b customers) and Customers (b2c/d2c customers) from other source tables.

At first I was thinking of keeping a simple design of just a single new hub/sat for the "order header" info:

    ┌───────┐    
    │H_Order│    
    │ <Id>  │    
    └───────┘    
        │        
        ▼        
┌───────────────┐
│  HSAT_Order   │
│     <b2b>     │
│     <b2c>     │
│  <internal>   │
│   <etc...>    │
└───────────────┘

I like this because it is simple and sticks to loading data unmodified from the source level. If you wanted to differentiate the different types of orders you could do that at the mart level and apply filters to your raw vault.

Is there an argument to be made for loading this source data into new links and satellites?

                          ┌───────┐                          
                          │H_Order│                          
                 ┌────────│ <Id>  │────────┐                 
                 │        └───────┘        │                 
                 │                         │                 
                 ▼                         ▼                 
        ┌────────────────┐         ┌───────────────┐         
        │L_ORDER_CUSTOMER│         │L_ORDER_VENDOR │         
        └────────────────┘         └───────────────┘         
                 │                         │                 
             ┌───┘                         └────┐            
             ▼                                  ▼            
┌────────────────────────┐         ┌────────────────────────┐
│                        │         │                        │
│  LSAT_Order_Customer   │         │   LSAT_Order_VENDOR    │
│   <descriptive info>   │         │   <descriptive info>   │
│                        │         │                        │
└────────────────────────┘         └────────────────────────┘

The only thing that feels off is that we are applying filters/logic to the source data to create the different entities within the raw vault and that the lsat tables have the same columns.

Upvotes: 0

Views: 156

Answers (1)

data_modeler
data_modeler

Reputation: 196

First, it looks like what you are building is a Transactional Link (or No-History Link), assuming that an Order in the Orders table never changes so there's no history to track.

Basically a T-Link just allows you to skip creating a Hub and Satellite and just put everything into the Link since, with no history, there would be a 1:1:1 relationship between the hub, link, and satellite (with regard to Order).

I've mocked up what I think it should look like.

(See this for the interactive version)

But even if you need to create a hub and satellite for Orders, it seems that the "unit of work" is that the order is placed. So, in short, I see that you would have 1 link (although, the creator of DV suggests that sometimes modelers break up a link like this into multiple smaller links, so it may be up to you and your needs; see reference below).

Reference: Building a Scalable Data Warehouse with Data Vault 2.0; p 109 for discussion on unit of work; p 132 for discussion on Nonhistorized Links

Upvotes: 0

Related Questions