Reputation: 21
I essentially have two parts to my code, both work on their own but not together. So I think I have a syntax issue.
The first part is creating the table, the second part is populating it. The issue is, both parts share the variable of the table name.
import os
import boto3
import botocore.session
region = os.environ.get('AWS_DEFAULT_REGION', 'us-east-2')
session = botocore.session.get_session()
dynamo = session.create_client('dynamodb', region_name=region)
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
obj = s3.get_object(Bucket=bucket, Key=key)
rows = obj['Body'].read().decode("utf-8"). split ('\n')
table = dynamodb.Table(key)
dynamodb.create_table(
TableName=key,
KeySchema=[
{
'AttributeName': 'first',
'KeyType': 'HASH' #Partition key
},
{
'AttributeName': 'last',
'KeyType': 'RANGE' #Sort key
}
],
AttributeDefinitions=[
{
'AttributeName': 'first',
'AttributeType': 'S'
},
{
'AttributeName': 'last',
'AttributeType': 'S'
},
],
ProvisionedThroughput={
'ReadCapacityUnits': 5,
'WriteCapacityUnits': 5
}
)
# Wait for the table to exist before exiting
print('Waiting for', key, '...')
waiter = dynamo.get_waiter('table_exists')
waiter.wait(TableName=key)
with table.batch_writer() as batch:
for row in rows:
batch.put_item(Item={
'first':row.split(',')[0],
'last':row.split(',')[1],
'age':row.split(',')[2],
'date':row.split(',')[3]
})
This is running as a lambda function whenever a CSV is dropped into my s3 bucket.
After running, it succeeds in creating the table but does not populate it. Ending with : "Task timed out after 3.00 seconds" It starts again after a few seconds and returns "Table already exists", but remains empty.
If I run just the batch_writer part, it will populate the table as long as it already exists.
Upvotes: 1
Views: 2698
Reputation: 7679
The short answer is that a new table usually takes roughly a second to become active, and Waiter.TableExists
uses a default polling interval of 20 seconds, which is causing your lambda function to timeout.
But what's really happening?
Internally, Waiter.TableExists
functions roughly like this pseudocode. (I've omitted error handling and other details for simplicity.)
function waitForTable(tableName):
while true:
if (dynamodb.describeTable(tableName).status == active):
return
else:
sleep 20 seconds
Right after you create your table, you start the waiter. When the waiter calls describeTable
, it sees that the table is not yet active, so it waits for 20 seconds. Your lambda timeout is set to 3 seconds, so after 3 seconds (before the waiter calls describeTable
again) your lambda function gets terminated. (That's what the "task timed out" message means.)
Then, when your lambda function is retried, the table is now active, so when your lambda function reaches the dynamodb.create_table(...)
call, DynamoDB will respond with an error because the table already exists. (Hence the "Table already exists" error message.)
How do I fix it?
There are several things you can do to fix this, and the "most correct" solution is probably to do all of them.
waiter.wait(TableName=key, WaiterConfig={'Delay': 1})
ResourceAlreadyInUseException
that is raised if it already exists). See this other SO Answer for How to check if a DynamoDB table exists, which explains multiple ways to check if the table exists, including code samples for each one.Upvotes: 3