Can I create and populate a dynamodb table in a single Lambda function?

Question

I essentially have two parts to my code, both work on their own but not together. So I think I have a syntax issue.

The first part is creating the table, the second part is populating it. The issue is, both parts share the variable of the table name.


   import os
   import boto3
   import botocore.session

   region = os.environ.get('AWS_DEFAULT_REGION', 'us-east-2')
   session = botocore.session.get_session()
   dynamo = session.create_client('dynamodb', region_name=region) 


   s3 = boto3.client('s3')
   dynamodb = boto3.resource('dynamodb')

   def lambda_handler(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    obj = s3.get_object(Bucket=bucket, Key=key)

    rows = obj['Body'].read().decode("utf-8"). split ('
')

    table = dynamodb.Table(key)
    dynamodb.create_table(
    TableName=key,
    KeySchema=[
        {
            'AttributeName': 'first',
            'KeyType': 'HASH'  #Partition key
        },
        {
            'AttributeName': 'last',
            'KeyType': 'RANGE'  #Sort key
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'first',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'last',
            'AttributeType': 'S'
        },

    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
        }
    )
   # Wait for the table to exist before exiting
    print('Waiting for', key, '...')
    waiter = dynamo.get_waiter('table_exists')
    waiter.wait(TableName=key)
    with table.batch_writer() as batch:
        for row in rows:

            batch.put_item(Item={

                'first':row.split(',')[0],
                'last':row.split(',')[1],
                'age':row.split(',')[2],
                'date':row.split(',')[3]

            })

This is running as a lambda function whenever a CSV is dropped into my s3 bucket.

After running, it succeeds in creating the table but does not populate it. Ending with : "Task timed out after 3.00 seconds" It starts again after a few seconds and returns "Table already exists", but remains empty.

If I run just the batch_writer part, it will populate the table as long as it already exists.

Matthew Pope · Accepted Answer

The short answer is that a new table usually takes roughly a second to become active, and Waiter.TableExists uses a default polling interval of 20 seconds, which is causing your lambda function to timeout.

But what's really happening?

Internally, Waiter.TableExists functions roughly like this pseudocode. (I've omitted error handling and other details for simplicity.)

function waitForTable(tableName):
    while true:
        if (dynamodb.describeTable(tableName).status == active):
            return
        else:
            sleep 20 seconds

Right after you create your table, you start the waiter. When the waiter calls describeTable, it sees that the table is not yet active, so it waits for 20 seconds. Your lambda timeout is set to 3 seconds, so after 3 seconds (before the waiter calls describeTable again) your lambda function gets terminated. (That's what the "task timed out" message means.)

Then, when your lambda function is retried, the table is now active, so when your lambda function reaches the dynamodb.create_table(...) call, DynamoDB will respond with an error because the table already exists. (Hence the "Table already exists" error message.)

How do I fix it?

There are several things you can do to fix this, and the "most correct" solution is probably to do all of them.

You could set the delay time of the waiter to a lower number, such as 1 second, like this: waiter.wait(TableName=key, WaiterConfig={'Delay': 1})
You can increase the timeout of your lambda function. The combination of creating the table, reading the S3 file, and writing it all to DynamoDB could take more than 3 seconds. Pick a number that gives your lambda function time to recover if a request needs to be retried. If your function works for a file with only 1-2 rows, but fails for larger files, I'd suggest trying 5 seconds, and if that doesn't reliably succeed, increase to 10 seconds. If the files could be very large, you should consider using something other than Lambda.
Assuming you're not concerned about overwriting data in a pre-existing table, then you should check if the table already exists before you try to create it (or try to create it and ignore the ResourceAlreadyInUseException that is raised if it already exists). See this other SO Answer for How to check if a DynamoDB table exists, which explains multiple ways to check if the table exists, including code samples for each one.

Can I create and populate a dynamodb table in a single Lambda function?

Answers (1)

Related Questions