JohnyMotorhead
JohnyMotorhead

Reputation: 821

Proper partition key in AWS DynamoDB table

I'm very new to no sql databases and to DynamoDb particularly. We're implementing the dot net application that will parse csv files and store its rows as separate records in AWS DynamoDb table. Each file can contain from 50 to 20000 rows.

In relational sql DB that we use the table would have such structure:

[dbo].[FileRecords]

Now I need to implement it in ASW DynamoDB. I've been investigating the DynamoDb documentation and sketched the following table structure:

            Primary key
Partition_key   |    Sort_key           | Data
FileId          |    Status_recordId    | Json

So data will looks following:

File1       Processing_record1
File1       Processing_record2
File1       Error_record3
File1       Error_record4

File2       Processing_record1
File2       Processing_record2
....
File2       Processing_record5000
File2       Processing_record5001
File2       Processing_record5002

FileID is the partition key of primary key, and the Sork_key is composite which is concatenation of status+unique row identifier. Our DynamobDb mode is "On Demand"

Questions I would like someone to help me with:

  1. I will use only queries to retrieve a collection of records by file id and status, i.e. where PK = "File1" and SK begins with "Processing". In this case will this table and keys be effective ?

  1. If File1 contains 100 records and File2 contains 10000 records then will this partition be distributed correct ?

  1. Should I use the "key-sharing" plan (if the mode is "On Demand") which means adding an index (from 1 to 100) to partition key so my DynamoDb table will contain following data:

File1_1     Processing_record1
File1_1     Processing_record2
File1_1     Error_record3
File1_1     Error_record4

File2_1     Processing_record1
File2_1     Processing_record1
....
File2_1     Processing_record5000
File2_2     Processing_record5001
File2_2     Processing_record5002

  1. Is it possible for me to catch ProvisionedThroughputExceededException if my pattern exceeds 3000 RCU or 1000 WCU and table mode is "On Demand" ?

Thanks, Evgeny.

Upvotes: 1

Views: 472

Answers (1)

Chris
Chris

Reputation: 1106

  1. Yes, that is an effective way to get your results.
  2. This will not have an optimum distribution. But this might not be problematic as dynamodb provides means to rebalance partitions and provide adaptive capacity for 'hot' partitions. Adaptive capacity is instant as of may 2019.
  3. I believe AWS refers to this techique as write sharding. It really depends on the workload you need for writing and reading. If your workload requirements exceeds the maximum of 3000 RCU or 1000 WCU on a single partition key, then this is a good strategy to overcome the max workload limit. Remember that this will also affect the way your application will query the table. You would need to issue a query per shard.
  4. The ProvisionedThroughputExceededException will be thrown by the DynamoDB client and can therefor be catched.

Upvotes: 0

Related Questions