Reputation: 132
I have one table containing my members.
customer_id | name | age |
---|---|---|
1 | John | 74 |
2 | Sarah | 87 |
Everyday, I get a new table containing the current members.
Imagine that I get a new upload with the following rows
customer_id | name | age |
---|---|---|
2 | Sarah | 87 |
3 | Melvin | 23 |
I then want to generate the table
customer_id | name | age |
---|---|---|
Null | Null | 74 |
2 | Sarah | 87 |
3 | Melvin | 23 |
I don't want to nullify anything by mistake and therefore I want to run a few tests on this table before I replace my old one. The way I've done this is by creating a temporary table (let's call it customer_temp). However, I've now created a cyclical dependency since I:
Is there anyway I can do this using dbt?
Upvotes: 1
Views: 3156
Reputation: 5805
Destroying data is tricky. I would avoid that unless it's necessary (e.g., DSAR compliance).
Assuming the new data is loaded into the same table in your database each day, I think this is a perfect candidate for snapshots, with the option for invalidating hard-deleted records. See the docs. This allows you to capture the change history of a table without any data loss.
If you turned on snapshots in the initial state, your snapshot table would look like (assuming the existing records had a timestamp of 1/1):
customer_id | name | age | valid_from | valid_to |
---|---|---|---|---|
1 | John | 74 | 1/1/2022 | |
2 | Sarah | 87 | 1/1/2022 |
Then, after the source table was updated, re-running dbt snapshot
(today) would create this table:
customer_id | name | age | valid_from | valid_to |
---|---|---|---|---|
1 | John | 74 | 1/1/2022 | 5/12/2022 |
2 | Sarah | 87 | 1/1/2022 | |
3 | Melvin | 23 | 5/12/2022 |
You can create the format you'd like with a simple query:
select
case when valid_to is null then customer_id else null end as customer_id,
case when valid_to is null then name else null end as name,
age
from {{ ref('my_snapshot') }}
Upvotes: 2