Alexander Gubarets
Alexander Gubarets

Reputation: 763

DynamoDB fill empty table with tonns of data capped at 1000WCU



I'm writing a script, that should fill the new table with data in the shortest terms (~650Gb table). The partition(hash) key is different between all records, so I can't imagine the better key. I've set the provisioned WCU for this table at 4k.

When script works, 16 independent threads put different data into the table at a high rate. During execution, I receive ProvisionedThroghputException. The Cloudwatch graphs show that consumed WCU is capped at 1000WCU.

It could happen if all data is put to one partition. As I understand, the DynamoDb would create the new partition, when data size would exceed the 10Gb limit. Is it so? So, during this data fill operation, I have only 1 partition and the limit of 1000WCU is understandable.

I've checked the https://aws.amazon.com/ru/premiumsupport/knowledge-center/dynamodb-table-throttled/
But seems that these suggestions are applied to already filled tables and you try to add a lot of new data there.

So I have 3 questions:
1. How I can speed up the process of inserting data into the new empty table?
2. When DynamoDB decide to create a new partition?
3. Can I set up a minimum number of partitions (for ex. 4), to use all the power of provisioned WCU (4k)?

UPD Cloudwatch graph: enter image description here


UPD2 the HASH key is long number. Actually it's not strongly unique. But max rows with same HASH key but different RANGE keys is 2.

Upvotes: 2

Views: 200

Answers (1)

Charles
Charles

Reputation: 23823

You can't manually specify the number of partitions used by DDB. It's automatically handled behind the scenes.

However, the way it's handled is laid out in the link provided by F_SO_K.

  • 1 for every 10GB of data
  • 1 for every 3000RCU and/or 1000WCU provisioned.

If you've provisioned 4000WCU, then you should have at least 4 partitions and you should be seeing 4000WCU consumed. Especially given that you said your hash key is unique for every record, you should have data uniformly spread out and not be running into a "hot" partition.

You mentioned cloudwatch showing consumed WCU at 1000, does cloudwatch also show provisioned capacity at 4000WCU?

If so, not sure what's going on, may have to call AWS.

Upvotes: 1

Related Questions