Reputation: 33
I'm trying to insert records into a table in a certain (and simple) order, as the table have an IDENTITY column (e.g. MyTbl (ID INT IDENTITY(1,1), Sale_Date DATE, Product_ID INT, Sales INT).
The query is quite simple (this is just a simplified example):
INSERT INTO MyTbl (Sale_Date, Product_ID, Sales)
SELECT Sale_Date, Product_ID,COUNT(*) as sales
FROM Fact_tbl
GROUP BY Sale_Date,Product_ID
ORDER BY Sale_Date,Product_ID
The expected behavior is that when I select the highest values of the identity ID column, I should see the latest Sale_Date. However, this is not the case. The order of the ID column in the table has nothing to do with the dates. To make things even worse, if I recreate the table and run the same INSERT statement again and again and again, I'm getting different order of insertion each time for the same data. I'm getting this behavior even if I encase the query and put the ORDER BY in or out of the casing.
I never saw this behavior in any other SQL platform. Is this the expected behavior in Snowflake?
Upvotes: 3
Views: 1907
Reputation: 10199
It's expected. Let me explain the reason:
AUTOINCREMENT and IDENTITY are synonymous. If either is specified for a column, Snowflake utilizes a sequence to generate the values for the column.
https://docs.snowflake.com/en/sql-reference/sql/create-table.html#optional-parameters
There is no guarantee that values from a sequence are contiguous (gap-free) or that the sequence values are assigned in a particular order. There is, in fact, no way to assign values from a sequence to rows in a specified order other than to use single-row statements (this still provides no guarantee about gaps).
https://docs.snowflake.com/en/user-guide/querying-sequences.html#sequence-semantics
With Snowflake each INSERT has completely different order than the same INSERT that ran a couple of minutes ago
No, it should insert the data in expected order because you use "ORDER BY" clause. The issue is, the sequence values are not assigned in a particular order!
It's not easy to verify if the data is sorted when you use "INSERT/SELECT ORDER BY", unless you have access to underlying metadata. For testing, you may define clustering keys on a table that you ingested "sorted" data.
Anyway, if you want to assign IDs matching the order when inserting bulk data, you need to use ROW_NUMBER instead of using an IDENTITY column or any sequence values.
Upvotes: 2
Reputation: 136
This is not expected behavior in Snowflake. However the way you insert data into your table (with the order by) doesn't affect the order in which the data is stored inside the table. You can leave the order by out in the insert, but you should include it in your select.
Upvotes: 1