hotmeatballsoup
hotmeatballsoup

Reputation: 625

Truncating DynamoDB tables beyond what the TTL allows

I have a DynamoDB table that has a created date/time column that indicates when the record/item was inserted into the table. I have about 20 years worth of data in this table (records were migrated from a previous database), and I would now like to truncate anything older than 6 months old moving forward.

The obvious thing to do here would be to set a TTL on the table for 6 months, however my understanding is that AWS TTLs only go back a certain number of years (please correct me if you know otherwise!). So my understanding is that if I set a 6 month TTL on 20 years of data, I might delete record starting at 6 months old going back maybe 3 - 5 yearrs, but then there'd be a whole lot of really old data left over, unaffected by the TTL (again please correct me if you know otherwise!). So I guess I'm looking for:

  1. The ability to do a manual, one-time deletion of data older than 6 months old; and
  2. The ability to set a 6 month TTL moving forward

For the first one, I need to execute something like DELETE FROM mytable WHERE created > '2018-06-25', however I can't figure out how to do this from the AWS/DynamoDB management console, any ideas?

For the second part, when I go to Manage TTL in the DynamoDB console:

enter image description here

I'm not actually seeing where I would set the 6 month expiry. Is it the date/time fields at the very bottom of that dialog?! Seems strange to me...if that were the case then the TTL wouldn't be a scrolling 6 month window, it would just be a hardcoded point in time which I'd need to keep updating manually so that data is never more than 6 months old...

Upvotes: 1

Views: 1447

Answers (2)

Marboni
Marboni

Reputation: 2459

Deleting old records directly or updating their TTL so they can be deleted later by DynamoDB both require the same write capacity. You’ll need to scan / query and delete records one-by-one.

If you have, let’s say, 90% of old data, the most cost- and time-efficient way of deleting it is to move remaining 10% to a new table and delete the old one.

Another non-standard way I see is to choose an existent timestamp field you can sacrifice (for instance, audit field such as creation date), remove it from the new records and use as TTL to delete the old ones. It will allow you to do what you need cheaper and without switching to another table that may require multi-step changes in your application, but requires the field to (a) not being in use, (b) be in the past and (c) be a UNIX timestamp. If you don’t want to delete it permanently, you may copy it to another attribute and copy back after all old records have been deleted and TTL on that field is turned off (or switched to another attribute). It will not work for records having timestamp before 5 years ago.

Upvotes: 0

Renato Byrro
Renato Byrro

Reputation: 3784

You are correct about how far back in time TTL goes, it's actually 5 years. The way it works is comparing your TTL attribute value with the current timestamp. If your item has a timestamp that is older than the current timestamp, it's scheduled for deletion in the next 48 hours (it's not immediate). So, if you use the timestamp of creation of the item, everything will be scheduled for deletion as soon as you insert, and that's not what you want.

The way you manage the 6-month expiry policy is in your application. When you create an item, set a TTL attribute to a timestamp 6 months ahead of the creation time and just leave it there. Dynamo will take care of deleting it in 6 months. For your "legacy" data, I can't see a way around querying and looping through each item and setting the TTL for each of them manually.

Upvotes: 2

Related Questions