Reputation: 594
I have a DynamoDB table say data
. This table has 400k items. Each item has 4 fields -
Right now all items have a status
= "Y". How can I update all items and set the status to "N" for all 400k items irrespective of the key or any condition?
In MySQL, an equivalent statement would be -
UPDATE data SET status = 'N';
I am looking to do it either through the command line or preferable in python using boto3
Upvotes: 1
Views: 1800
Reputation: 12259
The following code uses batch_write_item
DynamoDB API to update items in batches of size 25, which is the maximum number of items that batch_write_item
can take in a single API call. You might need to tweak this number if your items are large.
Warning: This is just a proof of concept example. You should use at your own risk.
import boto3
def update_status(item):
item['status'] = {
'S': 'N'
}
return item
client = boto3.client('dynamodb', region_name='<ddb-region>')
paginator = client.get_paginator('scan')
operation_parameters = {
'TableName': '<ddb-table-name>',
'PaginationConfig': {
'PageSize': 25
}
}
page_iterator = paginator.paginate(**operation_parameters)
for page in page_iterator:
response = client.batch_write_item(RequestItems={
'<ddb-table-name>': [
{
'PutRequest': {
'Item': update_status(item)
}
}
for item in page['Items']
]
})
print(response)
Upvotes: 1
Reputation: 13731
There is no easy or cheap way to do what you want to do. What you'll basically need to do is to read and write the entire database:
write:
UpdateItem
request with an UpdateExpression
of "set status = :N"
. This will only modify the "status" attribute (leaving the rest unchanged), but the cost you will incur (or provisioned throughput you will use) will be the cost of writing the entire item. So the sum of all these operations will be the cost of re-writing the entire database.UpdateItem
a ConditionExpression
that will only update the item if the item actually still exists (you can use a attribute_exists()
condition on its key attribute to verify that an item exists). This will allow your workload to delete items while doing these changes.BatchWriteItems
(batch_writer()
in boto3) to modify a group of items together, because this batch operation can only replace items - not modify an attribute of existing items. In any case, a BatchWriteItems
would not have reduced the costs (batches cost the same as the requests they contain).read:
Scan
operation, with Projection
set to KEYS_ONLY
to get only the keys (you don't need the data). The cost to you will be the same as read the entire item, unfortunately, not just reading the keys. So the sum of the cost of all these Scan operations will be reading the entire database.If you are using provisioned capacity for this table, you may be able to use whatever excess capacity you have that is not used by client requests to do this change slowly, in the background, basically for "free".
Whether or not this makes sense in your case really depends on how much excess capacity (both read and write!) you have provisioned. If you do this, you'll need to watch out not to use too much capacity for this background operation and hurt your real users - you'll need to have some sort of controller that notices capacity-exceeded errors and reduce the amount of capacity used by the background process.
If you actually have a lot of excess provisioned capacity that you've already paid for, you can do this background operation as quickly as you want! The read part, a Scan
, can be done in parallel as quickly as you want (using the "parallel scan" feature), and the write part for different keys can also, obviously, be done in parallel.
Upvotes: 2