Saqib Ali
Saqib Ali

Reputation: 4428

Does limiting the number of columns in Snowflake CTE improve the performance of a VIEW

I have a Snowflake VIEW defined as following:

create order_amount_by_order_type_view as (
with temp_table as (
select * FROM orders inner join order_lines on orders.order_no=order_lines.order_no)
select order_type, sum(amount)
from temp_table
group by orders.order_type
)

Note that I am selecting all fields in the CTE, even though they are not needed in the main query.

Question: Is Snowflake smart enough to recognize that only the order_type and amount is used in the main and optimize the CTE accordingly?

Or should I manually limit the required fields in the CTE?

create order_amount_by_order_type_view as (
with temp_table as (
select orders.order_type, order_lines.amount FROM orders inner join order_lines on orders.order_no=order_lines.order_no)
select order_type, sum(amount)
from temp_table
group by orders.order_type
)

Upvotes: 0

Views: 503

Answers (2)

John Ryan
John Ryan

Reputation: 71

The short answer to your question is: No. But Snowflake may introduce this at some point.

The more interesting question is why is it faster to query few columns?

Snowflake is a columnar database in which the data is physically stored for fast access to specific columns.

If you were to look into Oracle or SQL Server you'd find the data was stored in rows. In Oracle these are in 4K blocks.

Row Storage

However, if you were to look in Snowflake, you'd find the data stored in columns within "micro-partitions" which are approx 16MB in size.

Column Storage

This means you should avoid using:

SELECT * FROM TABLE

As Snowflake needs to return all columns from the table. If however you were to query only the columns you needed, say the table has 100 columns and your query returned 2 - you will only return 2 columns from remote storage (which is faster), and also make better use of the warehouse cache.

Upvotes: 0

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25968

Is should be smart enough, and the test would be to:

select * from table_name and see how many bytes are read, and then select col1,col2 from table_name and see how many bytes are read.

Then look at the read from the table of your CTE in your query and see if it's the smaller.

Upvotes: 1

Related Questions