thox
thox

Reputation: 159

Overhead of Redshift materialized view full REFRESH Vs Redshift VACUUM

I’m working on a project where I want to have a fixed/consistent amount of data in terms of a specific time period (eg. 365 days). Which approach is more efficient and will cost lesser workload between the two?

  1. Using a dedicated table and performing VACUUM to delete the oldest data
  2. Using a materialized view and refresh it to remove the oldest data from the view (incremental refresh wont be possible in my case)

Upvotes: 0

Views: 1206

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11032

There are a lot of variables that can affect which path to go down. Cluster size, table size, sortkey etc. can all come into play.

May I recommend a 3rd path?

You could populate "month" tables and combine them with a normal view. When a new month comes around a new table is created, the view is altered to include the new table and not the oldest, and drop the oldest table (or archived). The view can have a where clause that only looks back 365 days if that is important. There is no vacuum work required or refreshing of a materialized view. There is some coding in your ETL flow to recognize the need for a new table and a change to the UNIONing view.

Upvotes: 1

Related Questions