DBT: set valid_from and valid_to date when retrieving historical data with dbt snapshots

We set up dbt snapshots to build a redshift schema to track slow changing dimension. Our dbt snapshots crawl spectrum external schema partitionned by year | month | day in s3 bucket.

Everyhing works fine if we start to use day by day. But we cannot succeed to get historical data with its right changing date.

To retrieve historical data, in our dbt snapshot, we select where the partition = date_to_snapshot BUT the dbt_valid_from and the dbt_valid_to are set to the current day of execution and not the date of the partition we snapshot.

Is it possible to set the dbt_valid_from and the dbt_valid_to in a way that reflects the date we snapshot.

Thanks for your help

Upvotes: 0

Views: 2399

Answers (1)

tconbeer
tconbeer

Reputation: 5815

There are two "strategies" for a snapshot, timestamp and check.

The timestamp strategy relies on having an updated_at field in the source, and is the recommended way to snapshot a table. If you use the timestamp strategy, the dbt_valid_from and dbt_valid_to fields will be populated as you want. See the docs.

If you don't have an updated_at field in your source, or if that field isn't reliable and you need to use check, then you should build a model on top of your snapshot that calculates the timestamp you want (although if you don't have an updated_at field, I don't know how you would know when that record changed). You should also consider snapshotting more frequently (I typically configure a separate "job" to run dbt snapshot that runs much more frequently than the job that runs dbt build)

Upvotes: 1

Related Questions