AWS Redshift purge policy automation

Question

AWS Redshift team recommend using TRUNCATE in order to clean up a large table. I have a continuous EC2 service that keeps adding rows to a table. I would like to apply some purging mechanism, so that when the cluster is near full it will auto delete old rows (say using the index column).

Is there some best practice for doing that?
Do I need to write my own code to handle that? (if so is there already a Python script for that that I can use e.g. in a Lambda function?)

John Rotenstein · Accepted Answer

A common practice when dealing with continuous data is to create a separate table for each month, eg Sales-2018-01, Sales-2018-02.

Then create a VIEW that combines the tables:

CREATE VIEW sales AS
SELECT * FROM Sales-2018-01
UNION
SELECT * FROM Sales-2018-02

Then, create a new table each month and remove the oldest month from the View. This effectively gives a 12-month rolling view of the data.

The benefit is that data does not have to be deleted from tables (which would then require a VACUUM). Instead, the old table can simply be dropped, or kept around for historical reporting with a different View.

See: Using Time Series Tables - Amazon Redshift

AWS Redshift purge policy automation

Answers (1)

Related Questions