Reputation: 111
Can someone help me understand the behavior of my snapshot please?
I created a snapshot and scheduled it to run once daily at 10 AM UTC. It takes less than 4 minutes to run (along with a few other snapshots)
However, when I query
select distinct dbt_from_date from mysnapshot where to_date(dbt_from_date) = '2021-10-07'
(see screenshot 1) ,
the result is 993 rows spanning 8 hours of the day. I would expect that the result would be only one row with the time when the snapshot ran. Or at worst, the span should not be more than the 4 min it took to build the snapshot.
This is the code of my snapshot
{% snapshot XYZ_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True
)
}}
select * from {{ source('XYZ', 'ABC') }}
{% endsnapshot %}
Upvotes: 2
Views: 919
Reputation: 103
According to the DBT Snapshots documentation, it states that "For the timestamp strategy, the configured updated_at column is used to populate the dbt_valid_from, dbt_valid_to and dbt_updated_at columns." This means that your dbt_valid_from field is being populated from the "updated_at" field in your data source, rather than when the snapshot is run. This results in your values spanning a greater period than the 4-minute dbt snapshot runtime.
Upvotes: 2