Omer Biber
Omer Biber

Reputation: 1

Clustering table with many columns

I am having ETL runtime issues with a specific table that has ~100K rows and 650 columns most of them are ints (others are strings or dates). The table is clustered by 2 string columns, and the logic of the etl is mainly built with "Lag", "Coalesce", "Case" and "Least" commands. Lags are partitioned by the same columns as the ones in the table's clustering. The runtime of this table's step in the ETL is suffering from a very long duration. I am not highly familiar with cost effectiveness logic of Snowflake, and debugging this issue without knowing where to start looking takes too long (since running the query takes about an hour!) Any suggestions on where to start / reading materials that can help me solving this issue faster?

Upvotes: 0

Views: 61

Answers (1)

peterb
peterb

Reputation: 727

Cluster keys in snowflake work by reducing the number of micropartitions needed to scan. Since your table is very small, there is really no point in using a cluster key, and it will likely not have any impact on performance. I recommend looking at the query profile when the ETL step is running to see where the time is spent.

https://docs.snowflake.com/en/user-guide/ui-query-profile.html

Also, have you tried scaling to a larger warehouse?

Upvotes: 0

Related Questions