I am very new to Snowflake and while working with snowflake I had conflict between the below 2 options. Single warehouse with size X-Large (16 credits / hour) Multi-cluster ( with max clusters=2 & min clusters=2 ) with size Large (8 credits / hour) Considering the above 2 options Is there any advantage I can get by choosing 2nd option in terms of performance ? Note : I know the advantages of multi-cluster over a single warehouse. Please share your answer for this specific scenario (when min = max ).

Reputation: 998

Snowflake multi-cluster warehouse performance vs single warehouse with large warehouse size

I am very new to Snowflake and while working with snowflake I had conflict between the below 2 options.

Single warehouse with size X-Large (16 credits / hour)
Multi-cluster (with max clusters=2 & min clusters=2) with size Large (8 credits / hour)

Considering the above 2 options
Is there any advantage I can get by choosing 2nd option in terms of performance?

Note: I know the advantages of multi-cluster over a single warehouse. Please share your answer for this specific scenario (when min = max).

Upvotes: 2

Answers (3)

Simeon Pilgrim

Reputation: 25968

So the things that happen in running a query are.

belong I am going to just use single to mean the single instance and 'multi` to mean the multi instance cluster, of which when we run a query it is only ever on one instance.

Reading\Writing IO from your storage layer:

Here a single has twice the IO over the multi thus if your query is IO saturated the single is the better choice.

Parallel steps:

So if you have a GROUP BY over a high cardinality columns, both the single and multi should be equally good. If you have a low cardinality but billions of rows, the smaller instance might give better results as those complex steps cannot be broken over the larger cluster size of the single instance. But this is most likely lost in the wash if you have many concurrent queries.

Many queries / Noisy neighbour:

If you have hundreds of queries hitting in waves the larger single instance is worse at starting those queries, as it just has less concurrent tasks at once, and a single very large query which can flush caches, or just dominate cluster, means you stop handling normal/small queries. Where-as having the mutli cluster allow if only one "super heavy" query comes in, you only stall half your normal queries.

Other thoughts It also really depends on your load patterns, at my last job, we had auto-scaling cluster of SMALL instances used to used to answer our read queries of dashboards, reports, and we allowed it to run a little over provisioned, so things were snappy. Where-as our data loading ran on second auto-scaling cluster of MEDIUM instances, and which we overloaded on purpose, as we were trying to load data the fastest/cheapest. And in non-peak times we programmatically reduced the auto-scalling MAX to almost starve the load. But would do some expensive reprocessing on a LARGE instance via those saved credits in "the middle of the night" and also our loading tasks had the ability to spin up exclusive LARGE+ size warehouses to do one off rebuilds, as this was all IO bound work, and thus the smaller the window of "outage" the better the system was, and the IO scale linear, so the total cost was the same.

Which is all to say, "what is best" really depends on what you are doing, your budget, and the trade offs you are prepared for. The golden thing about snowflake it is not like a classic DB where you have to pick the size and get it right, pick one, and watch it, if it's struggling change it on the fly. We did this a number of times when a release of our code or snowflake changed the performance of some critical SQL, we would jump in, and double or triple the instance count, or size, to get past the situation, while trying to fix or work around SF issues, or awaiting SF to roll a release back. for a couple hours generally spending more credits is not budget braking. This flexibility also means you can just experiment, "what happens if we trying 4x smaller instance.." "oh nothing... look we just saved heaps of money"..

Upvotes: 2

Gokhan Atil

Reputation: 10079

Based on your comment, here is my answer:

In both scenarios, you will have the same resources to process your queries. The important difference would be about running single heavy queries. As you may know, a single query can not spawn to multiple clusters (yet), so when you run a query in your multi-cluster warehouse, it will be processed on one of the Large warehouses (and use max 8 nodes).

If you run the same query on your single XL warehouse, it can be executed by (max) 16 nodes. So if you will run heavy queries which requires more memory and CPU, using a single XL warehouse would be better for you.

About concurrency, there is a parameter named "MAX_CONCURRENCY_LEVEL". Its default value is 8, and it limits maximum number of concurrent executions per warehouse. If you do not change it, your single XL warehouse will execute a maximum of 8 queries concurrently, while your multi-cluster warehouse can execute 16 queries concurrently.

https://docs.snowflake.com/en/sql-reference/parameters.html#max-concurrency-level

You may increase this parameter, and provide same concurrency on both single XL and multi-cluster L warehouse. But in this case, you should be careful when you runn heavy and light queries together. Because one query may use most of the resources of the warehouse, and your light queries may have less resources and take a longer time. So I would recommend using a multi-cluster warehouse, if you will have "relatively" light/concurrent queries.

Upvotes: 2

NickW

Reputation: 9788

If you have min=max=2 then you permanently have 2 warehouses running (as long as they are not suspended). If you configure your multi-cluster warehouse like this then you lose a lot of the advantages but for your specific use case it might make sense, I suppose

Upvotes: 1

Snowflake multi-cluster warehouse performance vs single warehouse with large warehouse size

Answers (3)

Related Questions