David Posner
David Posner

Reputation: 1

Are materialized views in redshift worth their costs

I'm working on a project in aws redshift with a few billion rows where the main queries are rollups on time units. The current implementation has mvs for all these rollups. It seems to me that if redshift is all it's cracked up to be and the dist and sort keys are defined correctly the mvs should not be necessary and their costs in extra storage and maintenance (refresh). I'm wondering if anyone has analyzed this in a similar application.

Upvotes: 0

Views: 1276

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11032

You're thinking along the right path but the real world doesn't always allow for 'just do it better'.

You are correct that sometimes MVs are just used to forego the effort of optimizing a complex query but sometimes not. The selection of keys, especially distribution key, is a compromise between optimizing different workloads. Distribute one way and query A gets faster but query B gets slower. But if the results of query B don't need to be completely up to date, one can make an MV out of B and only pay the price on refresh.

Sometimes queries are very complex and time consuming (and not because they aren't optimized). The results of this query doesn't need to include the latest info to be valid so an MV can can make the cost of this query infrequent. [In reality MVs often represent complex subqueries that are referenced by a number of other queries which makes accentuates the frequent vs. infrequent value of the MV.]

Sometimes query types don't match well to Redshift's distributed, columnar nature and just don't perform well. Again, current-ness of data can be played off against cluster workload and these queries can be run at low usage times.

With all that said I think you are on the right path as I've also been trying to get people to see that many, many queries are just poorly written. Too often in the data world functionally correct equals done and in reality this is only half done. I've rewritten queries that were taking 90 minutes to execute (browning out the cluster when they ran) and got them down to 17 seconds. So keep up the good fight but use MVs as a last resort when compromise is the only solution.

Upvotes: 1

Related Questions