ThomasWear
ThomasWear

Reputation: 132

How to structure DBT tables with cyclical dependencies

I have one table containing my members.

customer_id name age
1 John 74
2 Sarah 87

Everyday, I get a new table containing the current members.

Imagine that I get a new upload with the following rows

customer_id name age
2 Sarah 87
3 Melvin 23

I then want to generate the table

customer_id name age
Null Null 74
2 Sarah 87
3 Melvin 23

I don't want to nullify anything by mistake and therefore I want to run a few tests on this table before I replace my old one. The way I've done this is by creating a temporary table (let's call it customer_temp). However, I've now created a cyclical dependency since I:

Is there anyway I can do this using dbt?

Upvotes: 1

Views: 3156

Answers (1)

tconbeer
tconbeer

Reputation: 5805

Destroying data is tricky. I would avoid that unless it's necessary (e.g., DSAR compliance).

Assuming the new data is loaded into the same table in your database each day, I think this is a perfect candidate for snapshots, with the option for invalidating hard-deleted records. See the docs. This allows you to capture the change history of a table without any data loss.

If you turned on snapshots in the initial state, your snapshot table would look like (assuming the existing records had a timestamp of 1/1):

customer_id name age valid_from valid_to
1 John 74 1/1/2022
2 Sarah 87 1/1/2022

Then, after the source table was updated, re-running dbt snapshot (today) would create this table:

customer_id name age valid_from valid_to
1 John 74 1/1/2022 5/12/2022
2 Sarah 87 1/1/2022
3 Melvin 23 5/12/2022

You can create the format you'd like with a simple query:

select
    case when valid_to is null then customer_id else null end as customer_id,
    case when valid_to is null then name else null end as name,
    age
from {{ ref('my_snapshot') }}

Upvotes: 2

Related Questions