Asfar Irshad
Asfar Irshad

Reputation: 765

Exporting data from dynamo db to a csv file

We want to export data from dynamo db to a file. We have around 150,000 records each record is of 430 bytes. It would be a periodic activity once a week. Can we do that with lambda? Is it possible as lambda has a maximum execution time of 15 minutes?

If there is a better option using python or via UI as I'm unable to export more than 100 records from UI?

Upvotes: 3

Views: 19680

Answers (5)

Rafael Marques
Rafael Marques

Reputation: 1874

You can export your data from dynamodb in a number of ways.

The simplest way would be a full table scan:

dynamodb = boto3.client('dynamodb')

response = dynamodb.scan(
    TableName=your_table,
    Select='ALL_ATTRIBUTES')

data = response['Items']

while 'LastEvaluatedKey' in response:
    response = dynamodb.scan(
        TableName=your_table,
        Select='ALL_ATTRIBUTES',
        ExclusiveStartKey=response['LastEvaluatedKey'])

    data.extend(response['Items'])

# save your data as csv here

But if you want to do it every x days, I would recommend to you:

Create your first dump from your table with the code above.

Then, you can create a DynamoDB trigger to a lambda function that can receive all your table changes (insert, update, delete), and then you can append the data in your csv file. The code would be something like below:

def lambda_handler(event, context):
    for record in event['Records']:
        # get the changes here and save it

Since you will receive only your table updates, you don't need to worry about the 15 minutes execution from lambda.

You can read more about dynamodb streams and lambda here: DynamoDB Streams and AWS Lambda Triggers

And if you want to work on your data, you can always create a aws glue or an EMR cluster.

Upvotes: 4

Sanjay KhanSsk
Sanjay KhanSsk

Reputation: 51

Here's a python code to convert For 15k rows it took , 11 seconds for me, may be not the best approach but works

from boto3.dynamodb.types import TypeDeserializer, TypeSerializer

dynamodb = boto3.client('dynamodb')
your_table "tabel_name"
deserializer = TypeDeserializer()
st = time.time()
response = dynamodb.scan(
    TableName=your_table,
    Select='ALL_ATTRIBUTES')

# data = response['Items']
for i in response['Items']:
    result.append({k: deserializer.deserialize(v) for k, v in i.items()})
        
while 'LastEvaluatedKey' in response:
    response = dynamodb.scan(
        TableName=your_table,
        Select='ALL_ATTRIBUTES',
        ExclusiveStartKey=response['LastEvaluatedKey'])

    for i in response['Items']:
        result.append({k: deserializer.deserialize(v) for k, v in i.items()})
et = time.time()

and convert to pandas and export it

import pandas as pd
df = pd.DataFrame(result)
df.to_excel("data.xlsx")

Upvotes: 1

Asfar Irshad
Asfar Irshad

Reputation: 765

Guys we resolved it using AWS lambda, 150,000 records (each record is of 430 bytes) are processed to csv file in 2.2 minutes using maximum available memory (3008 mb). Created an event rule for that to run on periodic basis. Time and size is written so that anyone can calculate how much they can do with lambda

Upvotes: 1

Rahul Goel
Rahul Goel

Reputation: 880

You can refer to an existing question on stackoverflow. This question is about exporting dynamo db table as a csv.

Upvotes: 0

F_SO_K
F_SO_K

Reputation: 14869

One really simple option is to use the Command Line Interface tools

aws dynamodb scan --table-name YOURTABLE --output text > outputfile.txt

This would give you a tab delimited output. You can run it as a cronjob for regular output.

The scan wouldn't take anything like 15 minutes (probably just a few seconds). So you wouldn't need to worry about your Lambda timing out if you did it that way.

Upvotes: 3

Related Questions