thedre
thedre

Reputation: 111

Why is my dbt snapshot result in so many dbt_from_dates from the same day?

Can someone help me understand the behavior of my snapshot please?

I created a snapshot and scheduled it to run once daily at 10 AM UTC. It takes less than 4 minutes to run (along with a few other snapshots)

However, when I query select distinct dbt_from_date from mysnapshot where to_date(dbt_from_date) = '2021-10-07' (see screenshot 1) , the result is 993 rows spanning 8 hours of the day. I would expect that the result would be only one row with the time when the snapshot ran. Or at worst, the span should not be more than the 4 min it took to build the snapshot.

This is the code of my snapshot

{% snapshot XYZ_snapshot %}

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='id',
      strategy='timestamp',
      updated_at='updated_at',
      invalidate_hard_deletes=True
    )
}}

select * from {{ source('XYZ', 'ABC') }}

{% endsnapshot %}

Screenshot 1

Upvotes: 2

Views: 919

Answers (1)

Ben
Ben

Reputation: 103

According to the DBT Snapshots documentation, it states that "For the timestamp strategy, the configured updated_at column is used to populate the dbt_valid_from, dbt_valid_to and dbt_updated_at columns." This means that your dbt_valid_from field is being populated from the "updated_at" field in your data source, rather than when the snapshot is run. This results in your values spanning a greater period than the 4-minute dbt snapshot runtime.

Upvotes: 2

Related Questions