Reputation: 31
I got a question on how delete works in snowflake. . As the partitions are immutable, it will create new partition when I delete records (from multiple immutable objects), my question here is what about the remaining empty space which is allotted (16mb compressed), will it keep as it is or the whole micro partitions will re-structured (re-arrange, defragment...) again?
Upvotes: 3
Views: 1316
Reputation: 25903
Delete is just a special case of insert & update. Which is best seen as change.
Simon and hkandpal make some very good points. About general life cycle.
As the small end I and over stack overflow users have tested that many small change to tiny table end up with still just one partition. So as Simon mentions there seems to be some form on appending/rewriting of small partitions.
But at the big end there is very much no free lunch to large change. What we have notice if mass deleting, that the output takes less partitions, which implies if you had 50 partitions hold 5000 rows each, and you delete every odd row, you end up with 25 partitions. So the write operation is bunching the partitions.
But at the same time if you have auto clustering turned on, the delete/update write is unordered, thus we have huge clustering "re-right" costs, after deleting data. Because the filter to find the rows to delete reorders the data, and that is how it is written, and then the auto clustering can spend 5x the original write cost re-ordering the data. so in some cases it's cheaper to do an order Create Table As Select with an order by clause to "delete" 1/30th of the data for 100+ GB tables.
Other facts that point to this lack of free lunch is the auto clustering does too tasks partition defragmentation, which the support engineers have mentioned in the past that might be made it's own feature. And re-ordering. We have small tables that we rebuild daily to keep the order perfect as it has huge impact for us, verse relying on auto-clustering, as the later is happy with mostly there, but the performance impact (it a table which address information that is join to almost everything) on the table being small and in order and cacheable is meaningful to us.
Upvotes: 2
Reputation: 1608
When a delete or update operation is performed, Snowflake deletes the partition file and replaces it with a new file with the changes.
For example, we have stored the data of a table in we delete all the records which have a name as “Andy”. Now Snowflake deletes the entire partition file P1 and replaces it with a new partition file P3 including the changes.
Jim Partition P1
John Partition P1
Andy Partition P1
Joe Partition P2
Mike Partition P2
Jeff Partition P2
New Partition P3
Jim Partition P3
John Partition P3
Upvotes: 0