User
User

Reputation: 2148

Non diverse Global Secondary Index in DynamoDB

Let's assume that I have a table with following attributes:

I will have a lot of users, but only few categories.

user_id  |  category_id
1           1
3           1
4           1
5           3
..          ..
50000000    1

Is it ok to store millions of records with the same category_id value as a Global Secondary Index? Should I expect any restrictions?

I'm wondering if scan is not a bad choice. I will use filtering by category_id only once a day. What is the cost (time and money) of scanning millions of records?

Thanks!

Upvotes: 0

Views: 366

Answers (1)

Max
Max

Reputation: 8836

According to the Limits documentation, the only limitation is:

No practical limit for tables without local secondary indexes.

For a table with local secondary indexes, there is a limit on item collection sizes: For every distinct hash key value, the total sizes of all table and index items cannot exceed 10 GB. Depending on your item sizes, this may constrain the number of range keys per hash value. For more information, see Item Collection Size Limit.

Now for your second question of whether you should be doing Query or Scan, you asked both from performance and monetary cost. Maintaining a GSI is expensive, because you have to pay for the throughput (and if I recall correctly also the storage) so its like paying for another table, plus its another table whose throughput you have to monitor to make sure you aren't being throttled. On the other hand, the performance is much better.

If you're planning on going through all categories once a day (which means every Document in the Table), then Scan is the way to go. You aren't gaining anything from Querying. Plus its cheaper (no extra GSI) and you don't have to worry about projections.

Upvotes: 1

Related Questions