iCHAIT
iCHAIT

Reputation: 594

How can I update an attribute in a DynamoDB table regardless of the key or condition?

I have a DynamoDB table say data. This table has 400k items. Each item has 4 fields -

Right now all items have a status = "Y". How can I update all items and set the status to "N" for all 400k items irrespective of the key or any condition?

In MySQL, an equivalent statement would be -

UPDATE data SET status = 'N';

I am looking to do it either through the command line or preferable in python using boto3

Upvotes: 1

Views: 1800

Answers (2)

jellycsc
jellycsc

Reputation: 12259

The following code uses batch_write_item DynamoDB API to update items in batches of size 25, which is the maximum number of items that batch_write_item can take in a single API call. You might need to tweak this number if your items are large.

Warning: This is just a proof of concept example. You should use at your own risk.

import boto3

def update_status(item):
    item['status'] = {
        'S': 'N'
    }
    return item

client = boto3.client('dynamodb', region_name='<ddb-region>')
paginator = client.get_paginator('scan')
operation_parameters = {
  'TableName': '<ddb-table-name>',
  'PaginationConfig': {
    'PageSize': 25
  }
}
page_iterator = paginator.paginate(**operation_parameters)
for page in page_iterator:
    response = client.batch_write_item(RequestItems={
        '<ddb-table-name>': [
            {
                'PutRequest': {
                    'Item': update_status(item)
                }
            }
            for item in page['Items']
        ]
    })
    print(response)

Upvotes: 1

Nadav Har&#39;El
Nadav Har&#39;El

Reputation: 13731

There is no easy or cheap way to do what you want to do. What you'll basically need to do is to read and write the entire database:

write:

  1. If you know the key of a single item, you can do a UpdateItem request with an UpdateExpression of "set status = :N". This will only modify the "status" attribute (leaving the rest unchanged), but the cost you will incur (or provisioned throughput you will use) will be the cost of writing the entire item. So the sum of all these operations will be the cost of re-writing the entire database.
  2. You should add to the above UpdateItem a ConditionExpression that will only update the item if the item actually still exists (you can use a attribute_exists() condition on its key attribute to verify that an item exists). This will allow your workload to delete items while doing these changes.
  3. Before starting this change process, change your client code to write new items with status = N. The change process may miss these new items, but it's fine if they are already created with status = N.
  4. You can't use BatchWriteItems (batch_writer() in boto3) to modify a group of items together, because this batch operation can only replace items - not modify an attribute of existing items. In any case, a BatchWriteItems would not have reduced the costs (batches cost the same as the requests they contain).

read:

  1. To get a list of all extant keys in the database, to do the above reads, you need to use a Scan operation, with Projection set to KEYS_ONLY to get only the keys (you don't need the data). The cost to you will be the same as read the entire item, unfortunately, not just reading the keys. So the sum of the cost of all these Scan operations will be reading the entire database.

If you are using provisioned capacity for this table, you may be able to use whatever excess capacity you have that is not used by client requests to do this change slowly, in the background, basically for "free".

Whether or not this makes sense in your case really depends on how much excess capacity (both read and write!) you have provisioned. If you do this, you'll need to watch out not to use too much capacity for this background operation and hurt your real users - you'll need to have some sort of controller that notices capacity-exceeded errors and reduce the amount of capacity used by the background process.

If you actually have a lot of excess provisioned capacity that you've already paid for, you can do this background operation as quickly as you want! The read part, a Scan, can be done in parallel as quickly as you want (using the "parallel scan" feature), and the write part for different keys can also, obviously, be done in parallel.

Upvotes: 2

Related Questions