Franck Dernoncourt
Franck Dernoncourt

Reputation: 83137

Empty a DynamoDB table with boto

How can I optimally (in terms financial cost) empty a DynamoDB table with boto? (as we can do in SQL with a truncate statement.)

boto.dynamodb2.table.delete() or boto.dynamodb2.layer1.DynamoDBConnection.delete_table() deletes the entire table, while boto.dynamodb2.table.delete_item() boto.dynamodb2.table.BatchTable.delete_item() only deletes the specified items.

Upvotes: 13

Views: 20996

Answers (4)

sedeh
sedeh

Reputation: 7313

This builds on the answer given by Persistent Plants. If the table already exists, you can extract the table definitions and use that to recreate the table.

import boto3

dynamodb = boto3.resource('dynamodb', region_name='us-east-2')

def delete_table_ddb(table_name):
    table = dynamodb.Table(table_name)
    return table.delete()


def create_table_ddb(table_name, key_schema, attribute_definitions,
                     provisioned_throughput, stream_enabled, billing_mode):
    settings = dict(
        TableName=table_name,
        KeySchema=key_schema,
        AttributeDefinitions=attribute_definitions,
        StreamSpecification={'StreamEnabled': stream_enabled},
        BillingMode=billing_mode
    )
    if billing_mode == 'PROVISIONED':
        settings['ProvisionedThroughput'] = provisioned_throughput
    return dynamodb.create_table(**settings)


def truncate_table_ddb(table_name):
    table = dynamodb.Table(table_name)
    key_schema = table.key_schema
    attribute_definitions = table.attribute_definitions
    if table.billing_mode_summary:
        billing_mode = 'PAY_PER_REQUEST'
    else:
        billing_mode = 'PROVISIONED'
    if table.stream_specification:
        stream_enabled = True
    else:
        stream_enabled = False
    capacity = ['ReadCapacityUnits', 'WriteCapacityUnits']
    provisioned_throughput = {k: v for k, v in table.provisioned_throughput.items() if k in capacity}
    delete_table_ddb(table_name)
    table.wait_until_not_exists()
    return create_table_ddb(
        table_name,
        key_schema=key_schema,
        attribute_definitions=attribute_definitions,
        provisioned_throughput=provisioned_throughput,
        stream_enabled=stream_enabled,
        billing_mode=billing_mode
    )

Now call use the function:

table_name = 'test_ddb'
truncate_table_ddb(table_name)

Upvotes: 1

Ethan Harris
Ethan Harris

Reputation: 1362

While i agree with Johnny Wu that dropping the table and recreating it is much more efficient, there may be cases such as when many GSI's or Tirgger events are associated with a table and you dont want to have to re-associate those. The script below should work to recursively scan the table and use the batch function to delete all items in the table. For massively large tables though, this may not work as it requires all items in the table to be loaded into your computer

import boto3
dynamo = boto3.resource('dynamodb')

def truncateTable(tableName):
    table = dynamo.Table(tableName)
    
    #get the table keys
    tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
    
    """
    NOTE: there are reserved attributes for key names, please see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html
    if a hash or range key is in the reserved word list, you will need to use the ExpressionAttributeNames parameter
    described at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan
    """

    #Only retrieve the keys for each item in the table (minimize data transfer)
    ProjectionExpression = ", ".join(tableKeyNames)
    
    response = table.scan(ProjectionExpression=ProjectionExpression)
    data = response.get('Items')
    
    while 'LastEvaluatedKey' in response:
        response = table.scan(
            ProjectionExpression=ProjectionExpression, 
            ExclusiveStartKey=response['LastEvaluatedKey'])
        data.extend(response['Items'])

    with table.batch_writer() as batch:
        for each in data:
            batch.delete_item(
                Key={key: each[key] for key in tableKeyNames}
            )
            
truncateTable("YOUR_TABLE_NAME")

Upvotes: 13

Persistent Plants
Persistent Plants

Reputation: 829

As Johnny Wu mentioned, deleting a table and re-creating it is more efficient than deleting individual items. You should make sure your code doesn't try to create a new table before it is completely deleted.

def deleteTable(table_name):
    print('deleting table')
    return client.delete_table(TableName=table_name)


def createTable(table_name):
    waiter = client.get_waiter('table_not_exists')
    waiter.wait(TableName=table_name)
    print('creating table')
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions= [
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        },
        StreamSpecification={
            'StreamEnabled': False
        }
    )


def emptyTable(table_name):
    deleteTable(table_name)
    createTable(table_name)

Upvotes: 11

Johnny Wu
Johnny Wu

Reputation: 862

Deleting a table is much more efficient than deleting items one-by-one. If you are able to control your truncation points, then you can do something similar to rotating tables as suggested in the docs for time series data.

Upvotes: 3

Related Questions