Reputation: 51
As the question says, I have about 200million data records in a table in dynamoDB. I am writing a script in nodejs that needs to delete all data without a TTL. I have 3 ideas for this, and I am curious on thoughts about how I should go about doing this on so many records.
batchWrite, this option I would scan then paginate through the whole table, deleting each record as it meets the condition of not currently having a ttl
push all records that dont have a ttl to a new table and then delete that table all at once
set a ttl for records that dont have one, but I cant find any information if this is even a thing or if I can somehow bulk add a ttl to all records without one
Any information is helpful, please let me know how I can go about doing this! Thank you
Upvotes: 2
Views: 1111
Reputation: 760
I would do it like this: (option 1)
import boto3
# Create a DynamoDB client
dynamodb = boto3.client('dynamodb')
# Name of the table to remove entries from
table_name = 'my-table'
# Get all items from the table
response = dynamodb.scan(
TableName=table_name
)
# Iterate over the items
for item in response['Items']:
# Check if the item has a TTL attribute
if 'ttl' not in item:
# Delete the item if it does not have a TTL attribute
dynamodb.delete_item(
TableName=table_name,
Key={
'id': item['id']
}
)
Upvotes: 0
Reputation: 387
I would go with option 1 -
Check Parallel Scan doc, pasting some information here.
Segment — A segment to be scanned by a particular worker. Each worker should use a different value for Segment.
TotalSegments — The total number of segments for the parallel scan. This value must be the same as the number of workers that your application will use.
Here, each segment will work on each partition DDB has made on your table. Each partition in DDB is of 10GB. With scanning/ read made faster, we can now perform deletes using BatchWrites
.
Upvotes: 2