ash333
ash333

Reputation: 251

AWS Redshift vs Snowflake use cases

I was wondering if anyone has used both AWS Redshift and Snowflake and use cases where one is better . I have used Redshift but recently someone suggested Snowflake as a good alternative . My use case is basically retail marketing data that will be used by handful of analysts who are not terribly SQL savvy and will most likely have reporting tool on top

Upvotes: 15

Views: 6810

Answers (4)

Looter
Looter

Reputation: 96

I evaluated Redshift and Snowflake, and a little bit of Athena and Spectrum as well. The latter two were non-starters in cases where we had big joins, as they would run out of memory. For Redshift, I could actually get a better price to performance ratio for a couple reasons:

  • allows me to choose a distribution key which is huge for co-located joins
  • allows for extreme discounts on three year reserved pricing, so much so that you can really upsize your compute at a reasonable cost

I could get better performance in most cases with Redshift, but it requires good MPP knowledge to setup the physical schema properly. The cost of expertise and complexity offsets some of the product cost.

Redshift stores JSON in a VARCHAR column. That can cause problems (OOM) when querying a subset of JSON elements across large tables, where the VARCHAR column is sized too big. In our case we had to define the VARCHAR as extremely large to accommodate a few records that had very large JSON documents.

Snowflake functionality is amazing, including:

  • ability to clone objects
  • deep functionality in handling JSON data
  • snowpipe for low maintenance loading, auto scaling loads, trickle updates
  • streams & tasks for home grown ETL
  • ability to scale storage and compute separately
  • ability to scale compute within a minute, requiring no data migration
  • and many more

One thing that I would caution about Snowflake is that one might be tempted to hire less skilled developers/DBAs to run the system. Performance in a bad schema design can be worked around using a huge compute cluster, but that may not be the best bang for the buck. Regardless, the functionality in Snowflake is amazing.

Upvotes: 3

Sami Yabroudi
Sami Yabroudi

Reputation: 247

We recently switched from Redshift to Snowflake for the following reasons:

  • Real-time data syncing
  • Handling of concurrent queries
  • Minimizing of database administration
  • Providing different amounts of computing power to different Looker users

A more in-depth writeup can be found on our data blog.

Upvotes: 5

chi stock
chi stock

Reputation: 69

I evaluated both Redshift(Redshfit spectrum with S3) and SnowFlake.

In my poc, snowFlake is way way better than Redshift. SnowFlake integrates well with Relational/NOSQL data. No upfront index or partition key required. It works amazing without worrying about what way to access the day.

Redshift is very limited and no json support. Its hard to understand the partition. You have to do lot of work to get something done. No json support. You can use redshift specturm as a bandaid to access S3. Good luck with partioning upfront. Once you created partition in S3 bucket, you are done with that and no way to change until unless you redo process all data again to new structue. You will end up sending time to fix these issues instead of working on fixing real business problems.

Its like comparing Smartphone vs Morse code mechine. Redshift is like morse code kind of implementation and its not for mordern development

Upvotes: 6

Ron Dunn
Ron Dunn

Reputation: 3078

Redshift is a good product, but it is hard to think of a use case where it is better than Snowflake. Here are some reasons why Snowflake is better:

  • The admin console is brilliant, Redshift has none.
  • Scale-up/down happens in seconds to minutes, Redshift takes minutes to hours.
  • The documentation for both products is good, but Snowflake is better laid out and more accessible.
  • You need to know less "secret sauce" to make Snowflake work well. On Redshift you need to know and understand the performance impacts of things like distribution keys and sort keys, at a minimum.
  • The load processes for Snowflake are more elegant than Redshift. Redshift assumes that your data is in S3 already. Snowflake supports S3, but has extensions to JDBC, ODBC and dbAPI that really simplify and secure the ingestion process.
  • Snowflake has great support for in-database JSON, and is rapidly enhancing its XML. Redshift has a more complex approach to JSON, and recommends against it for all but smaller use cases, and does not support XML.

I can only think of two cases which Redshift wins hands-down. One is geographic availability, as Redshift is available in far more locations than Snowflake, which can make a difference in data transfer and statement submission times. The other is the ability to submit a batch of multiple statements. Snowflake can only accept one statement at a time, and that can slow down your batches if they comprise many statements, especially if you are on another continent to your server.

At Ajilius our developers use Redshift, Snowflake and Azure SQL Data Warehouse on a daily basis; and we have customers on all three platforms. Even with that choice, every developer prefers Snowflake as their go-to cloud DW.

Upvotes: 24

Related Questions