Mojimi
Mojimi

Reputation: 3161

DynamoDB - Partition grouping or sharding?

So, looking through the DynamoDB docs, they'll often recommend that you "group" togheter items that are related in the same partition, as so to better distribute your partition usage.

Take the following example where we have an user that has contacts and invoices inside its partition :

enter image description here

So, if I need all of user_001's invoice I will simply query (pseudo):

QUERY WHERE PartitionKey = "user_001" AND SortKey.begins_with("invoice_")

But I recently noticed there's quite an issue when you use the method above.

You see, DynamoDB will search inside the whole user_001 partition for the invoices, and will consume read capacity based on all items searched, whether they where invoices or not.

This can be end up being very inefficient if you have a partition that is too big, let's say I had 10,000 contacts and 2 invoices, it could end up being very costly to get those 2 invoices.

I'm assuming this based on the quote by the docs :

DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application

The solution :

enter image description here

Wouldn't this be a better approach?

1) It shards the data better so I don't need to use starts_with

2) It allows me to use a time-based uuid as the sort key and enable more complex ordering/pagination

3) I will consume much less capacity on queries since it won't have to go through items I don't need

What's the question?

Well, what I said above is just theories and assumptions, the documentation doesn't make it clear how it really works behind the scene, and it even recommends picture 1 to be used.

But I'm really thinking picture 2 it's the best here, specially when you consider that now DynamoDB smartly distributes capacity throughout your partitions (and not evenly like it used to be)

So, are my points for thinking picture 2 being much better than 1 valid?

Upvotes: 2

Views: 349

Answers (1)

Matthew Pope
Matthew Pope

Reputation: 7679

You have assumed incorrectly—the documentation you have quoted applies to filter expressions.

If you have a condition that applies to your sort key, that should be part of the query expression, not a filter expression.

Upvotes: 0

Related Questions