Datavault modeling where source data is slightly heterogenous

Question

I am working with a database table that can have multiple types of data within it.

Here is a trivial example (Orders can be either b2c, b2b or internal transfers):

| OrderId | B2C Customer | B2B Customer | Internal | ShippingDate | OrderTotal | ... |
|---------|--------------|--------------|----------|--------------|------------|-----|
| 123     | Foo          | -            | -        | 1/1/2023     | 100        |     |
| 456     | -            | Bar          | -        | 1/2/2023     | 200        |     |
| 789     | -            | -            | Baz      | 1/3/2023     | 300        |     |

We already have Hubs and Satellites for Vendors (b2b customers) and Customers (b2c/d2c customers) from other source tables.

At first I was thinking of keeping a simple design of just a single new hub/sat for the "order header" info:

    ┌───────┐    
    │H_Order│    
    │   │    
    └───────┘    
        │        
        ▼        
┌───────────────┐
│  HSAT_Order   │
│          │
│          │
│     │
│       │
└───────────────┘

I like this because it is simple and sticks to loading data unmodified from the source level. If you wanted to differentiate the different types of orders you could do that at the mart level and apply filters to your raw vault.

Is there an argument to be made for loading this source data into new links and satellites?

                          ┌───────┐                          
                          │H_Order│                          
                 ┌────────│   │────────┐                 
                 │        └───────┘        │                 
                 │                         │                 
                 ▼                         ▼                 
        ┌────────────────┐         ┌───────────────┐         
        │L_ORDER_CUSTOMER│         │L_ORDER_VENDOR │         
        └────────────────┘         └───────────────┘         
                 │                         │                 
             ┌───┘                         └────┐            
             ▼                                  ▼            
┌────────────────────────┐         ┌────────────────────────┐
│                        │         │                        │
│  LSAT_Order_Customer   │         │   LSAT_Order_VENDOR    │
│      │         │      │
│                        │         │                        │
└────────────────────────┘         └────────────────────────┘

The only thing that feels off is that we are applying filters/logic to the source data to create the different entities within the raw vault and that the lsat tables have the same columns.

Datavault modeling where source data is slightly heterogenous

Answers (1)

Related Questions