Reputation: 471

Why does snowflake create table as select (CTAS) ignore order by clause?

The command is:

drop table if exists metrics_done;
create table metrics_done as select * from metrics where end_morning='2022-03-31' order by LOG_INFO desc;

The expected behaviour is creation of a table with sorted entries. But this does not happen. Why?

Upvotes: 1

Answers (1)

Greg Pavlik

Reputation: 11086

Snowflake does use ORDER BY on a CTAS. You can see that by using the system$clustering_information - subject to some limitations on high cardinality and how the function checks clustering state before it runs the auto clustering service with a new key at least once.

However, just because Snowflake uses the ORDER BY in a CTAS, it doesn't mean the rows will return in order without using an ORDER BY clause. Snowflake is an MPP system and will scan multiple micropartitions during a query. Without specifying an ORDER BY, there is no reason the optimizer should generate a plan that guarantees order. The plan it generates can and will return rows in the order they're ready for the result.

Here's an over-simplistic example: on a CTAS you order by date and all rows in micropartition 1 have date 2022-01-01; all rows in micropartition have date 2022-01-02. When you select rows from that table, the scan for micropartition 2 is just as likely to finish first as micropartition 1 is. If #2 finishes first, those rows will be first in the result set.

Also, when the table becomes large and it has more micropartitions assigned to scan than there are available CPUs in the warehouse, one or more CPUs will need to scan multiple micropartitions. In this case, there's no reason to prefer to scan one micropartition before another.

Upvotes: 4

Why does snowflake create table as select (CTAS) ignore order by clause?

Answers (1)

Related Questions