micah
micah

Reputation: 8096

Merge Or Create Table And Swap?

I have a table in snowflake that I merge changes into often (1-2 times per day). Row updates are streamed into an updates table and I query the updates table as will as the primary table at the same time. When I do merge the updates into the primary table, I use the the MERGE command:

MERGE INTO primary p
        USING (
            SELECT ... FROM (
               SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY last_updated_at DESC) rn FROM updates
            ) WHERE rn = 1
        ) se
        ON (p.id = u.id)
        WHEN MATCHED THEN
            UPDATE SET ...
        WHEN NOT MATCHED THEN
            INSERT (...)
            VALUES (...)

The number of updates and the size of the table means that my primary table is entirely rewritten when this happens. While not ideal, this alone is not an issue for us because we don't do it often and when we do we use a large enough warehouse to complete the merge in under 10m.

My question is, since the table gets rebuilt anyway, is there a reason not to merge results into an entirely new table where the results can be pre-clustered (order by cluster columns) and then swap the tables? Is merge doing anything more efficiently here?

Upvotes: 1

Views: 612

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

The lack of ORDER BY on the UPDATE/MERGE commands, is a blessing if you want raw performance, of the UPDATE/MERGE.

But if as you note, if you are using clustering, then after the update/merge you might have hours of poor SELECT performance, and pay a lot more of the privilege of the auto-clustering on the table. We experienced a 2x+ cost of the auto clustering as compared to a full rewrite for a table for an update to a table where 1/3 of the partitions where re-written. The is just the computer costs of the full table write verse the auto clustering credit uses, over the couple hours after the update. I didn't track what the increase in disk usage was because of the churn, but normally considered space free in comparison to compute credits.

Upvotes: 1

Related Questions