Reputation: 1
We set up dbt snapshots to build a redshift schema to track slow changing dimension. Our dbt snapshots crawl spectrum external schema partitionned by year | month | day in s3 bucket.
Everyhing works fine if we start to use day by day. But we cannot succeed to get historical data with its right changing date.
To retrieve historical data, in our dbt snapshot, we select where the partition = date_to_snapshot BUT the dbt_valid_from and the dbt_valid_to are set to the current day of execution and not the date of the partition we snapshot.
Is it possible to set the dbt_valid_from and the dbt_valid_to in a way that reflects the date we snapshot.
Thanks for your help
Upvotes: 0
Views: 2399
Reputation: 5815
There are two "strategies" for a snapshot, timestamp
and check
.
The timestamp
strategy relies on having an updated_at
field in the source, and is the recommended way to snapshot a table. If you use the timestamp
strategy, the dbt_valid_from
and dbt_valid_to
fields will be populated as you want. See the docs.
If you don't have an updated_at
field in your source, or if that field isn't reliable and you need to use check
, then you should build a model on top of your snapshot that calculates the timestamp you want (although if you don't have an updated_at
field, I don't know how you would know when that record changed). You should also consider snapshotting more frequently (I typically configure a separate "job" to run dbt snapshot
that runs much more frequently than the job that runs dbt build
)
Upvotes: 1