Aditya Sahu
Aditya Sahu

Reputation: 63

How to efficiently delete data from Redshift?

I have data in my Redshift cluster. I need to find the best and efficient way to delete the previously stored data when I re run the job.

I have these two column to determine previous data previous_key (column that corresponds to run_dt) and creat_ts (time when we load the data)

I found two approaches so far but they don't work in efficient way:

Please suggest any good approach to rerun jobs on Redshift cluster. Note: partitions functionality is not available.

Upvotes: 1

Views: 11200

Answers (2)

Ryan
Ryan

Reputation: 299

It sounds like you want to delete data after a certain time period.

In this case Redshift has a recommended approach "Time-Series Tables":


Basically, you create a new table for every insert of a fixed time window. Then the main interface to this data is a view that UNION's all of these tables together.

When you want to drop data after a time window, you can simply drop the entire table / remove it from the view definition. No Vacuum / Analyze / Expensive queries required.

Source: https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html

Upvotes: 1

Eralper
Eralper

Reputation: 6622

Deleting data stored in Redshift with DELETE command will take time. The reason is that you are doing a soft delete, I mean you mark existing rows as deleted and then insert new row representing updated form of the data.

So one way is executing DELETE for junks of data. Instead of deleting one by one you should try to address multiple rows. Since each write takes place in 1 MB chunks of data, we should be minimizing those data read and writes eventually.

If you have a good information about the topology of the data stored in Redshift compute nodes and slices, addition to that information about distribution key and sort key, you can separate your DELETE command into multiple statements. (Any how we are expecting Redshift SQL Engine to do this for the SQL developer)

Upvotes: 2

Related Questions