Viktor
Viktor

Reputation: 31

How delete works in snowflake partitions

I got a question on how delete works in snowflake. . As the partitions are immutable, it will create new partition when I delete records (from multiple immutable objects), my question here is what about the remaining empty space which is allotted (16mb compressed), will it keep as it is or the whole micro partitions will re-structured (re-arrange, defragment...) again?

Upvotes: 3

Views: 1316

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

Delete is just a special case of insert & update. Which is best seen as change.

Simon and hkandpal make some very good points. About general life cycle.

As the small end I and over stack overflow users have tested that many small change to tiny table end up with still just one partition. So as Simon mentions there seems to be some form on appending/rewriting of small partitions.

But at the big end there is very much no free lunch to large change. What we have notice if mass deleting, that the output takes less partitions, which implies if you had 50 partitions hold 5000 rows each, and you delete every odd row, you end up with 25 partitions. So the write operation is bunching the partitions.

But at the same time if you have auto clustering turned on, the delete/update write is unordered, thus we have huge clustering "re-right" costs, after deleting data. Because the filter to find the rows to delete reorders the data, and that is how it is written, and then the auto clustering can spend 5x the original write cost re-ordering the data. so in some cases it's cheaper to do an order Create Table As Select with an order by clause to "delete" 1/30th of the data for 100+ GB tables.

Other facts that point to this lack of free lunch is the auto clustering does too tasks partition defragmentation, which the support engineers have mentioned in the past that might be made it's own feature. And re-ordering. We have small tables that we rebuild daily to keep the order perfect as it has huge impact for us, verse relying on auto-clustering, as the later is happy with mostly there, but the performance impact (it a table which address information that is join to almost everything) on the table being small and in order and cacheable is meaningful to us.

Upvotes: 2

Himanshu Kandpal
Himanshu Kandpal

Reputation: 1608

When a delete or update operation is performed, Snowflake deletes the partition file and replaces it with a new file with the changes.

For example, we have stored the data of a table in we delete all the records which have a name as “Andy”. Now Snowflake deletes the entire partition file P1 and replaces it with a new partition file P3 including the changes.

Jim    Partition P1
John   Partition P1
Andy   Partition P1

Joe    Partition P2
Mike   Partition P2
Jeff   Partition P2


New Partition P3

Jim    Partition P3
John   Partition P3

Upvotes: 0

Related Questions