Reputation: 66
I have a snowflake query with multiple ctes and inserting into a table using a Talend job. It takes more than 90 minutes to execute the query. It is multiple cascading ctes, one is calling other and other is calling the other.
I want to improve the performance of the query. It is like 1000 lines of code and I can't paste it here. As I checked the profile and it is showing all the window functions and aggregate functions which slows down the query.
For example, the top slower is,
ROW_NUMBER() OVER (PARTITION BY LOWER(S.SUBSCRIPTIONID)
ORDER BY S.ISROWCURRENT DESC NULLS FIRST,
TO_NUMBER(S.STARTDATE) DESC NULLS FIRST,
IFF(S.ENDDATE IS NULL, '29991231', S.ENDDATE) DESC NULLS FIRST)
takes 7.3% of the time. Can you suggest an alternative way to improve the performance of the query please?
Upvotes: 2
Views: 2485
Reputation: 426
The problem is that 1000 lines are very hard for any query analyzer to optimize. It also makes troubleshooting a lot harder for you and for a future team member who inherits the code.
I recommend breaking the query up and these optimizations:
CREATE TEMPORARY TABLE AS
instead of CTEs. Add ORDER BY
as you create the table on the column that you will join or filter on. The temporary tables are easier for the optimizer to build and later use. The ORDER BY
helps Snowflake know what to optimize for with subsequent joins to other tables. They're also easier for troubleshooting.TO_NUMBER(S.STARTDATE)
and IFF(S.ENDDATE IS NULL, '29991231', S.ENDDATE)
.IDENTITY
, SEQUENCE
, or populate an INTEGER
column which you can use as the sortkey. You can also literally name this new column sortKey. Sorting an integer will be significantly faster than running a function on a DATETIME
and then ordering by it.Notes:
create or replace temporary table table1 as select * from dual;
After that you refer to table1 instead of your code instead of the CTE.create materialized view mymv as select col1, col2 from mytable;
Upvotes: 5