ensnare
ensnare

Reputation: 42033

Copying one table to another in DynamoDB

What's the best way to identically copy one table over to a new one in DynamoDB?

(I'm not worried about atomicity).

Upvotes: 72

Views: 120595

Answers (15)

Jorge Tovar
Jorge Tovar

Reputation: 1869

Python + boto3

The script is idempotent as far as you maintain the same Keys.

import boto3


def migrate(source, target):
    dynamo_client = boto3.client('dynamodb', region_name='us-east-1')
    dynamo_target_client = boto3.client('dynamodb', region_name='us-west-2')

    dynamo_paginator = dynamo_client.get_paginator('scan')
    dynamo_response = dynamo_paginator.paginate(
        TableName=source,
        Select='ALL_ATTRIBUTES',
        ReturnConsumedCapacity='NONE',
        ConsistentRead=True
    )
    for page in dynamo_response:
        for item in page['Items']:
            dynamo_target_client.put_item(
                TableName=target,
                Item=item
            )


if __name__ == '__main__':
    migrate('awesome-v1', 'awesome-v2')

Upvotes: 6

moorthy.coder
moorthy.coder

Reputation: 41

It does the expected copy & paste

#event

{
  "table_source": "table1",
  "table_destination": "table2"
}

#AWS lamda

import { DynamoDBClient, ScanCommand,PutItemCommand } from "@aws-sdk/client-dynamodb";
const client = new DynamoDBClient({region: "ap-south-1"});


export const handler = async (event) => {

  //scanning the source table data
  let Items;
  const scan_command_input = {TableName: event.table_source};
  const scanCommand = new ScanCommand(scan_command_input);
  try {
    const response  = await client.send(scanCommand);
    //console.log(response);
    //console.log(response.Items);
    Items=response.Items;
  } catch (error) {
    console.error(error);
  }
  console.log("Successfully scanned data from source table");
  console.log("Copying", Items.length, "Items");
  console.log(Items);
  
  //dumping data into destination table
  for (var i=0;i<Items.length;i++){
    const put_command_input = {TableName: event.table_destination,Item:Items[i],};
    const putItemCommand = new PutItemCommand(put_command_input);
    try {
      const response = await client.send(putItemCommand);
      console.log(response);
    } catch (error) {
      console.error(error);
    }
  }
  
  console.log("Successfully dumped data into destination table");
};

Upvotes: 0

Luis Ciber
Luis Ciber

Reputation: 300

Improved script to support batch-write-item with more than 25 items.

#!/bin/bash

# exit on error
set -eo pipefail

# tables
TABLE_FROM=$1
TABLE_TO=$2

BATCH_COUNT=0
NEXT_BATCH_TOKEN=null

while true; do
  # read
  if [ $NEXT_BATCH_TOKEN == "null" ]; then
  read_response=$(aws dynamodb scan \
    --table-name "$TABLE_FROM" \
    --output json \
    --max-items 25)
  else
  read_response=$(aws dynamodb scan \
    --table-name "$TABLE_FROM" \
    --output json \
    --max-items 25 \
    --starting-token $NEXT_BATCH_TOKEN)
  fi

  echo $read_response | jq "{ \"$TABLE_TO\": [ .Items[] | { PutRequest: { Item: . } } ] }" \
    > "$TABLE_TO-payload_batch_$BATCH_COUNT.json"

  items=$(echo $read_response | jq '.Items')
  
  # check if there are items to write
  if [ ${#items[@]} -eq 0 ]; then
    break
  fi

  # extract next batch token
  NEXT_BATCH_TOKEN=$(echo $read_response | jq -r '.NextToken')
  
  # check if there are more items to read
  if [ $NEXT_BATCH_TOKEN == "null" ]; then
    break
  fi
  
  BATCH_COUNT=$((BATCH_COUNT+1))
done

# write
for i in $(seq 0 $BATCH_COUNT); do
  aws dynamodb batch-write-item \
    --request-items file://"$TABLE_TO-payload_batch_$i.json"
done

Upvotes: 0

I like the idea of having a simple bash script for this, so I decided to take @bogdan-kiselitsa's excellent answer and extend it for tables larger than 25 items.

# exit on error
set -eo pipefail

# tables
TABLE_FROM=$1
TABLE_TO=$2

# read
aws dynamodb scan \
  --table-name "$TABLE_FROM" \
  --output json \
 | jq "[ .Items[] | { PutRequest: { Item: . } } ]" \
 > "$TABLE_FROM-dump.json"

table_size="$(cat "${TABLE_FROM}-dump.json" | jq '. | length')"
echo "table size: ${table_size}"

# write in batches of 25
for i in $(seq 0 25 $table_size); do
  j=$(( i + 25 ))
  cat "${TABLE_FROM}-dump.json" | jq -c '{ "'$TABLE_TO'": .['$i':'$j'] }' > "${TABLE_TO}-batch-payload.json"
  echo "Loading records $i through $j (up to $table_size) into ${TABLE_TO}"
  aws dynamodb batch-write-item --request-items file://"${TABLE_TO}-batch-payload.json"
  rm "${TABLE_TO}-batch-payload.json"
done


# clean up
rm "${TABLE_FROM}-dump.json"

If you save this to migrate.sh then you can run:

$ ./migrate.sh table_v1 table_v2

Upvotes: 19

Luke Worth
Luke Worth

Reputation: 631

You can copy data between existing tables using AWS CLI:

  1. Run:
aws ddb select $SOURCE_TABLE >file.yaml
  1. Edit file.yaml to delete everything except for the list of items.
  2. Run:
aws ddb put $DESTINATION_TABLE "$(<file.yaml)"

Upvotes: 3

Tom Barton
Tom Barton

Reputation: 41

This is a little script I made to copy the contents of one table to another. It's based on the AWS-SDK v3. Not sure how well it would scale to big tables but as a quick and dirty solution it does the job.

It gets your AWS credentials from a profile in ~/.aws/credentials change default to the name of the profile you want to use.

Other than that it takes two args one for the source table and one for destination

const { fromIni } = require("@aws-sdk/credential-providers");
const { DynamoDBClient, ScanCommand, PutItemCommand } = require("@aws-sdk/client-dynamodb");

const ddbClient = new DynamoDBClient({
  credentials: fromIni({profile: "default"}),
  region: "eu-west-1",
});

const args = process.argv.slice(2);
console.log(args)

async function main() {

  const { Items } = await ddbClient.send(
    new ScanCommand({
      TableName: args[0],
    })
  );
  console.log("Successfully scanned table")
  console.log("Copying", Items.length, "Items")

  const putPromises = [];

  Items.forEach((item) => {
    putPromises.push(
      ddbClient.send(
        new PutItemCommand({
          TableName: args[1],
          Item: item,
        })
      )
    );
  });

  await Promise.all(putPromises);
  console.log("Successfully copied table")
}

main();

Usage

node copy-table.js <source_table_name> <destination_table_name>

Upvotes: 4

harishanth raveendren
harishanth raveendren

Reputation: 604

It's been a very long time since the question was posted and AWS has been continuously improvising features. At the time of writing this answer, we have the option to export the Table to S3 bucket then use the import feature to import this data from S3 into a new table which automatically will re-create a new table with the data. Plese refer this blog for more idea on export & import

Best part is that you get to change the name, PK or SK.

Note: You have to enable PITR (might incur additional costs). Always best to refer documents.

Upvotes: 0

Nicolai Lissau
Nicolai Lissau

Reputation: 8312

Another option is to download the table as a .csv file and upload it with the following snippet of code.

This also eliminates the need for providing your AWS credentials to a packages such as the one @ezzat suggests.

  1. Create a new folder and add the following two files and your exported table
  2. Edit uploadToDynamoDB.js and add the filename of the exported table and your table name
  3. Run npm install in the folder
  4. Run node uploadToDynamodb.js

File: Package.json

{
  "name": "uploadtodynamodb",
  "version": "1.0.0",
  "description": "",
  "main": "uploadToDynamoDB.js",
  "author": "",
  "license": "ISC",
  "dependencies": {
    "async": "^3.1.1",
    "aws-sdk": "^2.624.0",
    "csv-parse": "^4.8.5",
    "fs": "0.0.1-security",
    "lodash": "^4.17.15",
    "uuid": "^3.4.0"
  }
}

File: uploadToDynamoDB.js

var fs = require('fs');
var parse = require('csv-parse');
var async = require('async');
var _ = require('lodash')

var AWS = require('aws-sdk');

// If your table is in another region, make sure to update this
AWS.config.update({ region: "eu-central-1" });
var ddb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });

var csv_filename = "./TABLE_CSV_EXPORT_FILENAME.csv";
var tableName = "TABLENAME"

function prepareData(data_chunk) {

  const items = data_chunk.map(obj => {

    const keys = Object.keys(obj)

    let attr = Object.values(obj)

    attr = attr.map(a => {

        let newAttr;
        // Can we make this an integer
        if (isNaN(Number(a))) {
            newAttr = { "S": a }
        } else {
            newAttr = { "N": a }
        }

        return newAttr

    })

    let item = _.zipObject(keys, attr)

    return {
        PutRequest: {
            Item: item
        }
    }
  })

  var params = {
    RequestItems: {
        [tableName]: items
    }
  };

  return params

}

rs = fs.createReadStream(csv_filename);
parser = parse({
    columns : true,
    delimiter : ','
}, function(err, data) {

    var split_arrays = [], size = 25;

    while (data.length > 0) {
            split_arrays.push(data.splice(0, size));
    }

    data_imported = false;
    chunk_no = 1;

    async.each(split_arrays, function(item_data, callback) {

        const params = prepareData(item_data)

        ddb.batchWriteItem(
            params,
            function (err, data) {
                if (err) {
                    console.log("Error", err);
                } else {
                    console.log("Success", data);
                }
            });

    }, function() {
            // run after loops
            console.log('all data imported....');

    });

});
rs.pipe(parser);

Upvotes: 0

Saurabh Shrivastava
Saurabh Shrivastava

Reputation: 1113

DynamoDB now supports importing from S3.

https://aws.amazon.com/blogs/database/amazon-dynamodb-can-now-import-amazon-s3-data-into-a-new-table/

So, probably in almost all use cases, the easiest and cheapest way to replicate a table is

  1. Use "Export to S3" feature to dump entire table into S3. Since this uses backup to generate the dump, table's throughput is not affected, and is very fast as well. You need to have backups (PITR) enabled. See https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

  2. Use "Import from S3" to import the dump created in step 1. This automatically requires you to create a new table.

Upvotes: 5

Bogdan Kiselitsa
Bogdan Kiselitsa

Reputation: 348

Here's one solution to copy all items from one table to another, just using shell scripting, the AWS CLI and jq. Will work OK for smallish tables.

# exit on error
set -eo pipefail

# tables
TABLE_FROM=<table>
TABLE_TO=<table>

# read
aws dynamodb scan \
  --table-name "$TABLE_FROM" \
  --output json \
 | jq "{ \"$TABLE_TO\": [ .Items[] | { PutRequest: { Item: . } } ] }" \
 > "$TABLE_TO-payload.json"

# write
aws dynamodb batch-write-item --request-items file://"$TABLE_TO-payload.json"

# clean up
rm "$TABLE_TO-payload.json"

If you both tables to be identical, you'd want to delete all items in TABLE_TO first.

Upvotes: 8

Saisumanth Gopisetty
Saisumanth Gopisetty

Reputation: 946

Create a backup(backups option) and restore the table with a new table name. That would get all the data into the new table. Note: Takes considerable amount of time depending on the table size

Upvotes: 59

mkobit
mkobit

Reputation: 47259

On November 29th, 2017 Global Tables was introduced. This may be useful depending on your use case, which may not be the same as the original question. Here are a few snippets from the blog post:

Global Tables – You can now create tables that are automatically replicated across two or more AWS Regions, with full support for multi-master writes, with a couple of clicks. This gives you the ability to build fast, massively scaled applications for a global user base without having to manage the replication process.

...

You do not need to make any changes to your existing code. You simply send write requests and eventually consistent read requests to a DynamoDB endpoint in any of the designated Regions (writes that are associated with strongly consistent reads should share a common endpoint). Behind the scenes, DynamoDB implements multi-master writes and ensures that the last write to a particular item prevails. When you use Global Tables, each item will include a timestamp attribute representing the time of the most recent write. Updates are propagated to other Regions asynchronously via DynamoDB Streams and are typically complete within one second (you can track this using the new ReplicationLatency and PendingReplicationCount metrics).

Upvotes: 0

tim-phillips
tim-phillips

Reputation: 1087

I just used the python script, dynamodb-copy-table, making sure my credentials were in some environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), and it worked flawlessly. It even created the destination table for me.

python dynamodb-copy-table.py src_table dst_table

The default region is us-west-2, change it with the AWS_DEFAULT_REGION env variable.

Upvotes: 53

Alastair McCormack
Alastair McCormack

Reputation: 27714

AWS Pipeline provides a template which can be used for this purpose: "CrossRegion DynamoDB Copy"

See: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-crossregion-ddb-create.html

The result is a simple pipeline that looks like:

enter image description here

Although it's called CrossRegion you can easily use it for the same region as long the destination table name is different (Remember that table names are unique per account and region)

Upvotes: 19

Chen Harel
Chen Harel

Reputation: 10052

You can use Scan to read the data and save it to the new table.

On the AWS forums a guy from the AWS team posted another approach using EMR: How Do I Duplicate a Table?

Upvotes: 7

Related Questions