Reputation: 134
we need to add TTL to a 3bln+ record ddb table. It's a live table used very frequently, so we cannot bring it down or even redirect requests to another table. I understand we need to run a script to manually add a new TTL attribute to the table? Are there other approaches to this?
Any idea how long executing so many updates will take for 3bln+ records? Does this have extra financial cost?
Thanks!
Upvotes: 1
Views: 1071
Reputation: 13731
According to https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ProvisionedThroughput.html,
UpdateItem—Modifies a single item in the table. DynamoDB considers the size of the item as it appears before and after the update. The provisioned throughput consumed reflects the larger of these item sizes. Even if you update just a subset of the item's attributes, UpdateItem will still consume the full amount of provisioned throughput (the larger of the "before" and "after" item sizes).
So although your script will only go adding one small TTL column to each item, the cost of this operation in WCUs would be identical to rewriting the entire database. If you're using on-demand billing mode, this cost can be huge, and you should definitely consider it.
However, if you're using the provisioned capacity billing mode, the cost may be more manageable: You are probably running below capacity, so can add extra write operations for free. However, the question then becomes is how many extra capacity do you have: If you just have 1,000 requests per second extra capacity, it will take you an entire month to write the TTLs on 3 billion items at that pace. In any case, if you're using provisioned capacity, your script will need to carefully do flow control: It should run at a pace of N updates per second, slowly increasing N, but as soon as it gets an error about overflowed capacity, it should lower N. If you don't do such flow control, you can end up strangling your actual live application.
Finally, another question you'll need to address is how will you even know which items exist, to add to them the TTL fields. If you don't have some external knowledge on which keys exists, you'll unfortunately need to "Scan" the entire table to figure out the existing keys. You can ask Scan to only return the keys, not the full items, but this will still cost you the same as reading the entire item (but it will less network bandwidth). Luckily, reading in DynamoDB is much cheaper than writing. Also, again, if you're using provisioned capacity, you may be able to slowly do that scan for free over existing over-provisioned capacity.
Upvotes: 2
Reputation: 8887
There isn't a way to avoid some cost, but you can minimize it. Since you are adding a new field you can use DynamoDB update item, which allows you to only write the fields that you are affecting. That will do a couple things.
One thing to keep in mind is that if your existing services are reading the data and then writing it back you may have issues with race conditions. For example, you read the data without the TTL column, then the process runs to add the TTL, and then you write the record back without the TTL. If you think you might have that issue you'll need to work around that as well.
Upvotes: -1