Reputation: 83137
How can I optimally (in terms financial cost) empty a DynamoDB table with boto? (as we can do in SQL with a truncate
statement.)
boto.dynamodb2.table.delete()
or boto.dynamodb2.layer1.DynamoDBConnection.delete_table()
deletes the entire table, while boto.dynamodb2.table.delete_item()
boto.dynamodb2.table.BatchTable.delete_item()
only deletes the specified items.
Upvotes: 13
Views: 20996
Reputation: 7313
This builds on the answer given by Persistent Plants. If the table already exists, you can extract the table definitions and use that to recreate the table.
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
def delete_table_ddb(table_name):
table = dynamodb.Table(table_name)
return table.delete()
def create_table_ddb(table_name, key_schema, attribute_definitions,
provisioned_throughput, stream_enabled, billing_mode):
settings = dict(
TableName=table_name,
KeySchema=key_schema,
AttributeDefinitions=attribute_definitions,
StreamSpecification={'StreamEnabled': stream_enabled},
BillingMode=billing_mode
)
if billing_mode == 'PROVISIONED':
settings['ProvisionedThroughput'] = provisioned_throughput
return dynamodb.create_table(**settings)
def truncate_table_ddb(table_name):
table = dynamodb.Table(table_name)
key_schema = table.key_schema
attribute_definitions = table.attribute_definitions
if table.billing_mode_summary:
billing_mode = 'PAY_PER_REQUEST'
else:
billing_mode = 'PROVISIONED'
if table.stream_specification:
stream_enabled = True
else:
stream_enabled = False
capacity = ['ReadCapacityUnits', 'WriteCapacityUnits']
provisioned_throughput = {k: v for k, v in table.provisioned_throughput.items() if k in capacity}
delete_table_ddb(table_name)
table.wait_until_not_exists()
return create_table_ddb(
table_name,
key_schema=key_schema,
attribute_definitions=attribute_definitions,
provisioned_throughput=provisioned_throughput,
stream_enabled=stream_enabled,
billing_mode=billing_mode
)
Now call use the function:
table_name = 'test_ddb'
truncate_table_ddb(table_name)
Upvotes: 1
Reputation: 1362
While i agree with Johnny Wu that dropping the table and recreating it is much more efficient, there may be cases such as when many GSI's or Tirgger events are associated with a table and you dont want to have to re-associate those. The script below should work to recursively scan the table and use the batch function to delete all items in the table. For massively large tables though, this may not work as it requires all items in the table to be loaded into your computer
import boto3
dynamo = boto3.resource('dynamodb')
def truncateTable(tableName):
table = dynamo.Table(tableName)
#get the table keys
tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
"""
NOTE: there are reserved attributes for key names, please see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html
if a hash or range key is in the reserved word list, you will need to use the ExpressionAttributeNames parameter
described at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan
"""
#Only retrieve the keys for each item in the table (minimize data transfer)
ProjectionExpression = ", ".join(tableKeyNames)
response = table.scan(ProjectionExpression=ProjectionExpression)
data = response.get('Items')
while 'LastEvaluatedKey' in response:
response = table.scan(
ProjectionExpression=ProjectionExpression,
ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
with table.batch_writer() as batch:
for each in data:
batch.delete_item(
Key={key: each[key] for key in tableKeyNames}
)
truncateTable("YOUR_TABLE_NAME")
Upvotes: 13
Reputation: 829
As Johnny Wu mentioned, deleting a table and re-creating it is more efficient than deleting individual items. You should make sure your code doesn't try to create a new table before it is completely deleted.
def deleteTable(table_name):
print('deleting table')
return client.delete_table(TableName=table_name)
def createTable(table_name):
waiter = client.get_waiter('table_not_exists')
waiter.wait(TableName=table_name)
print('creating table')
table = dynamodb.create_table(
TableName=table_name,
KeySchema=[
{
'AttributeName': 'YOURATTRIBUTENAME',
'KeyType': 'HASH'
}
],
AttributeDefinitions= [
{
'AttributeName': 'YOURATTRIBUTENAME',
'AttributeType': 'S'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
},
StreamSpecification={
'StreamEnabled': False
}
)
def emptyTable(table_name):
deleteTable(table_name)
createTable(table_name)
Upvotes: 11
Reputation: 862
Deleting a table is much more efficient than deleting items one-by-one. If you are able to control your truncation points, then you can do something similar to rotating tables as suggested in the docs for time series data.
Upvotes: 3