maxime
maxime

Reputation: 2315

DynamoDB, what's better: multiple small queries or one big query?

I have the following dataset

Post Table

| UserID | title     |
-----------------------
| userA  | Article 1 |
| userA  | Article 2 |
| userB  | Article 3 |
| userC  | Article 4 |
| userB  | Article 5 |

I'd like to get all the articles from userA and userB. In reality, I can have 50 userIDs in input and a thousand of posts in the output.

What is the best way to perform a such query:

Using MySQL, I knew it was better to do one big query than many smalls. But is this true with DynamoDB ?

Upvotes: 1

Views: 2281

Answers (1)

davidrv87
davidrv87

Reputation: 908

When using DynamoDB, you need to stop thinking about SQL databases. The paradigm is completely different.

To answer your question, it is best to use one query because small queries will always round up your Read Capacity even if you don't use it all.

Avoid Scan as much as possible, it is the most expensive one as the filtering happens AFTER all the documents have been read.

Deep dive in DynamoDB

In your case, I see a clear access pattern where your Primary Key is a combination of UserID (your hashKey) and title (your rangeKey).

However, that would make your table very strict, meaning that your hashKey is always a UserId and your rangeKey a title. So instead of naming your keys as UserId and title just use generic names, such as pk and sk, these being just strings. This way you can do very powerful queries (the most efficient way to get info from your table) like pk=userA and sk BEGINS_WITH Article. But this assumes that all your article start with the word Article which I don't think it would be the case.

Going a bit further, and to enable you to have a single table app, you can prefix your different entities at the time of saving them. For example, you could prefix all your UserId with USER# and all your article title with ART#, so your table would look like (note the new names of the columns):

Post Table

| pk          | sk            |
------------------------------
| USER#userA  | ART#Article 1 |
| USER#userA  | ART#Article 2 |
| USER#userB  | ART#Article 3 |
| USER#userC  | ART#Article 4 |
| USER#userB  | ART#Article 5 |

With this setup, you can do now pk=USER#userA and sk BEGINS_WITH ART# to give you in JUST ONE query, all the articles for userA

With this prefixing approach, nothing stops you from having a heteregenous single table (the real power of DynamoDB is about to be unleashed). For example:

My heteregenous Table

| pk            | sk            |
--------------------------------
| USER#userA    | ART#Article 1 |
| USER#userA    | ART#Article 2 |
| USER#userB    | ART#Article 3 |
| USER#userC    | ART#Article 4 |
| USER#userB    | ART#Article 5 |
| ART#Article 5 | COM#Comment 1 |
| ART#Article 5 | COM#Comment 2 | 
| ART#Article 6 | COM#Comment 1 |
| ART#Article 6 | COM#Comment 2 |

What if you want all comments for a given article? Easy pk=ART#Article 5 and sk BEGINS_WITH COM#

Finally, you can use a technique called Inverted Index to create a GSI where you just switch your pk for your sk and your sk for your pk to give you even more potential queries. This is clearly explained in the first of these masterpieces, which I recommend you to watch in this order:

  1. AWS re:Invent 2019
  2. AWS re:Invent 2018

Upvotes: 7

Related Questions