Teppic
Teppic

Reputation: 2546

How to maximize WRUs in DynamoDB?

I need to bulk insert records into a DynamoDB database on a weekly basis. I do this by dropping the table, creating a new table with On Demand capacity, then using BatchWriteItem to populate the table. According to the documentation, newly created tables with On Demand capacity can serve up to 4,000 WCUs. No matter what I try though the most I can get is 1,487 WCUs. I have tried the following:

Although the throughput differs from experiment to experiment and from execution to execution, 1,487 WCUs comes up often enough that there may be some significance to it.

What do I need to do to leverage the full 4,000 WCUs available to me?

Upvotes: 1

Views: 273

Answers (1)

Maurice
Maurice

Reputation: 13117

Your limitation appears to be on the writer side, I've written a small Python script to create and load test a table.

We can see that DynamoDB easily scales up to 4000 WRU with 8 worker processes, then throttles a bit and afterwards scales up again. To get more throughput, I'd have to add more writer processes:

Write Request Units vs. Throttle Events

Here is the script for your convenience:

import multiprocessing
import typing
import uuid

import boto3
import boto3.dynamodb.conditions as conditions

from botocore.exceptions import ClientError

TABLE = "speed-measurement"
NUMBER_OF_WORKERS = 8

def create_table_if_not_exists(table_name: str):

    try:
        boto3.client("dynamodb").create_table(
            AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
            TableName=table_name,
            KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
            BillingMode="PAY_PER_REQUEST"
        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ResourceInUseException':
            # Table already exists
            pass
        else:
            raise err

def write_fast(worker_num):

    table = boto3.resource("dynamodb").Table(TABLE)

    counter = 0

    with table.batch_writer() as batch:
        while True:

            counter += 1
            
            result = batch.put_item(
                Item={
                    "PK": str(uuid.uuid4())
                }
            )

            if counter % 1000 == 0:
                print(f"Worker: #{worker_num} Wrote item #{counter}")

def main():
    create_table_if_not_exists(TABLE)
    
    with multiprocessing.Pool(NUMBER_OF_WORKERS) as pool:
        pool.map(write_fast, range(NUMBER_OF_WORKERS))

if __name__ == "__main__":
    main()

Just run it with Python 3 and stop it with Ctrl+C once you're seeing the desired metrics. It will create a table and just write as fast as it can in 8 processes. You can also increase this number.

Source for the CloudWatch Graphic:

{
    "metrics": [
        [ { "expression": "m2/60", "label": "Write Request Units", "id": "e1", "color": "#2ca02c" } ],
        [ "AWS/DynamoDB", "WriteThrottleEvents", "TableName", "speed-measurement", { "yAxis": "right", "id": "m1" } ],
        [ ".", "ConsumedWriteCapacityUnits", ".", ".", { "stat": "Sum", "period": 1, "id": "m2", "visible": false } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "region": "eu-central-1",
    "stat": "Maximum",
    "period": 60,
    "yAxis": {
        "left": {
            "label": "Consumed Write Request Units",
            "showUnits": false
        },
        "right": {
            "label": "Write Throttle Events",
            "showUnits": false
        }
    },
    "annotations": {
        "horizontal": [
            {
                "color": "#9edae5",
                "label": "Initial Limit",
                "value": 4000,
                "fill": "below"
            }
        ]
    },
    "legend": {
        "position": "bottom"
    },
    "setPeriodToTimeRange": true
}

Upvotes: 2

Related Questions