Reputation: 2546
I need to bulk insert records into a DynamoDB database on a weekly basis. I do this by dropping the table, creating a new table with On Demand capacity, then using BatchWriteItem to populate the table. According to the documentation, newly created tables with On Demand capacity can serve up to 4,000 WCUs. No matter what I try though the most I can get is 1,487 WCUs. I have tried the following:
Although the throughput differs from experiment to experiment and from execution to execution, 1,487 WCUs comes up often enough that there may be some significance to it.
What do I need to do to leverage the full 4,000 WCUs available to me?
Upvotes: 1
Views: 273
Reputation: 13117
Your limitation appears to be on the writer side, I've written a small Python script to create and load test a table.
We can see that DynamoDB easily scales up to 4000 WRU with 8 worker processes, then throttles a bit and afterwards scales up again. To get more throughput, I'd have to add more writer processes:
Here is the script for your convenience:
import multiprocessing
import typing
import uuid
import boto3
import boto3.dynamodb.conditions as conditions
from botocore.exceptions import ClientError
TABLE = "speed-measurement"
NUMBER_OF_WORKERS = 8
def create_table_if_not_exists(table_name: str):
try:
boto3.client("dynamodb").create_table(
AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
TableName=table_name,
KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
BillingMode="PAY_PER_REQUEST"
)
except ClientError as err:
if err.response["Error"]["Code"] == 'ResourceInUseException':
# Table already exists
pass
else:
raise err
def write_fast(worker_num):
table = boto3.resource("dynamodb").Table(TABLE)
counter = 0
with table.batch_writer() as batch:
while True:
counter += 1
result = batch.put_item(
Item={
"PK": str(uuid.uuid4())
}
)
if counter % 1000 == 0:
print(f"Worker: #{worker_num} Wrote item #{counter}")
def main():
create_table_if_not_exists(TABLE)
with multiprocessing.Pool(NUMBER_OF_WORKERS) as pool:
pool.map(write_fast, range(NUMBER_OF_WORKERS))
if __name__ == "__main__":
main()
Just run it with Python 3 and stop it with Ctrl+C once you're seeing the desired metrics. It will create a table and just write as fast as it can in 8 processes. You can also increase this number.
Source for the CloudWatch Graphic:
{
"metrics": [
[ { "expression": "m2/60", "label": "Write Request Units", "id": "e1", "color": "#2ca02c" } ],
[ "AWS/DynamoDB", "WriteThrottleEvents", "TableName", "speed-measurement", { "yAxis": "right", "id": "m1" } ],
[ ".", "ConsumedWriteCapacityUnits", ".", ".", { "stat": "Sum", "period": 1, "id": "m2", "visible": false } ]
],
"view": "timeSeries",
"stacked": false,
"region": "eu-central-1",
"stat": "Maximum",
"period": 60,
"yAxis": {
"left": {
"label": "Consumed Write Request Units",
"showUnits": false
},
"right": {
"label": "Write Throttle Events",
"showUnits": false
}
},
"annotations": {
"horizontal": [
{
"color": "#9edae5",
"label": "Initial Limit",
"value": 4000,
"fill": "below"
}
]
},
"legend": {
"position": "bottom"
},
"setPeriodToTimeRange": true
}
Upvotes: 2