Reputation: 2315
I have the following dataset
Post Table
| UserID | title |
-----------------------
| userA | Article 1 |
| userA | Article 2 |
| userB | Article 3 |
| userC | Article 4 |
| userB | Article 5 |
I'd like to get all the articles from userA and userB. In reality, I can have 50 userIDs in input and a thousand of posts in the output.
What is the best way to perform a such query:
Using MySQL, I knew it was better to do one big query than many smalls. But is this true with DynamoDB ?
Upvotes: 1
Views: 2281
Reputation: 908
When using DynamoDB, you need to stop thinking about SQL databases. The paradigm is completely different.
To answer your question, it is best to use one query because small queries will always round up your Read Capacity even if you don't use it all.
Avoid Scan as much as possible, it is the most expensive one as the filtering happens AFTER all the documents have been read.
In your case, I see a clear access pattern where your Primary Key
is a combination of UserID
(your hashKey
) and title
(your rangeKey
).
However, that would make your table very strict, meaning that your hashKey
is always a UserId
and your rangeKey
a title
. So instead of naming your keys as UserId
and title
just use generic names, such as pk
and sk
, these being just strings. This way you can do very powerful queries (the most efficient way to get info from your table) like pk=userA and sk BEGINS_WITH Article
. But this assumes that all your article start with the word Article
which I don't think it would be the case.
Going a bit further, and to enable you to have a single table app, you can prefix your different entities at the time of saving them. For example, you could prefix all your UserId
with USER#
and all your article title
with ART#
, so your table would look like (note the new names of the columns):
Post Table
| pk | sk |
------------------------------
| USER#userA | ART#Article 1 |
| USER#userA | ART#Article 2 |
| USER#userB | ART#Article 3 |
| USER#userC | ART#Article 4 |
| USER#userB | ART#Article 5 |
With this setup, you can do now pk=USER#userA and sk BEGINS_WITH ART#
to give you in JUST ONE query, all the articles for userA
With this prefixing approach, nothing stops you from having a heteregenous single table (the real power of DynamoDB is about to be unleashed). For example:
My heteregenous Table
| pk | sk |
--------------------------------
| USER#userA | ART#Article 1 |
| USER#userA | ART#Article 2 |
| USER#userB | ART#Article 3 |
| USER#userC | ART#Article 4 |
| USER#userB | ART#Article 5 |
| ART#Article 5 | COM#Comment 1 |
| ART#Article 5 | COM#Comment 2 |
| ART#Article 6 | COM#Comment 1 |
| ART#Article 6 | COM#Comment 2 |
What if you want all comments for a given article? Easy pk=ART#Article 5 and sk BEGINS_WITH COM#
Finally, you can use a technique called Inverted Index to create a GSI where you just switch your pk
for your sk
and your sk
for your pk
to give you even more potential queries. This is clearly explained in the first of these masterpieces, which I recommend you to watch in this order:
Upvotes: 7