datahack
datahack

Reputation: 701

DynamoDB update one column of all items

We have a huge DynamoDB table (~ 4 billion items) and one of the columns is some kind of category (string) and we would like to map this column to either new one category_id (integer) or update existing one from string to int. Is there a way to do this efficiently without creating new table and populating it from beginning. In other words to update existing table?

Upvotes: 1

Views: 6288

Answers (1)

Charles
Charles

Reputation: 23858

Is there a way to do this efficiently

Not in DynamoDB, that use case is not what it's designed for...

Also note, unless you're talking about the hash or sort key (of the table or of an existing index), DDB doesn't have columns.

You'd run Scan() (in a loop since it only returns 1MB of data)...

Then Update each item 1 at a time. (note could BatchUpdate of 10 items at a time, but that save just network overhead..still does 10 individual updates)

If the attribute in question is used as a key in the table or an existing index...then a new table is your only option. Here's a good article with a strategy for migrating a production table.

  1. Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
  2. Enable DynamoDB Streams on the original table
  3. Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
  4. [Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
  5. Scan the GSI created in the previous step (or entire table) and use the following Filter:
    FilterExpression = "attribute_not_exists(Migrated)"
    Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).

NOTE You may want to increase write capacity units on the table during the updates.

  1. The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
  2. Once all items have been migrated, repoint the code to the new table
  3. Remove original table, and Lambda function once happy all is good.

Upvotes: 3

Related Questions