DynamoDB data modeling

Question

Having a java BitSet representing user uniques, I would like to store into DynamoDB in order to use queries like "give me all BitSets from date X to date Y with a concrete key".

My first approach was to use a primary key representing what I really want to count, for example an action: "users-who-pay". Then the range key is the date and at finally I have the value into a binary attribute.

But probably it's not a good approach because I will have a few keys and lots of dates, so I would like to know if some one recommends me another approach.

Rohit Kulshreshtha · Accepted Answer

A section in the DynamoDB documentation deals with a similar use-case. See Take Advantage of Sparse Indexes

Take Advantage of Sparse Indexes

For any item in a table, DynamoDB will only write a corresponding entry to a global secondary index if the index key value is present in the item. For global secondary indexes, this is the index hash key and its range key (if present). If the index key value(s) do not appear in every table item, the index is said to be sparse.

You can use a sparse global secondary index to efficiently locate table items that have an uncommon attribute. To do this, you take advantage of the fact that table items that do not contain global secondary index attribute(s) are not indexed at all. For example, in the GameScores table, certain players might have earned a particular achievement for a game - such as "Champ" - but most players have not. Rather than scanning the entire GameScores table for Champs, you could create a global secondary index with a hash key of Champ and a range key of UserId. This would make it easy to find all the Champs by querying the index instead of scanning the table.

Such a query can be very efficient, because the number of items in the index will be significantly fewer than the number of items in the table. In addition, the fewer table attributes you project into the index, the fewer read capacity units you will consume from the index.

The example sounds very similar to your "users-who-pay" use case - only difference is that (replace "champ" with "paying user"). However, it is talking about a situation in which very few users are champs (and that's why it's OK to have "champ" as the hash-key - read more about good hash keys here - http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html). This could be remedied by saying that you have (say) 100 hash keys for champs - champ00, champ01, ..., champ99. One of the values could be chosen randomly at the time of writing the entry into DynamoDB.

DynamoDB data modeling

Answers (2)

Related Questions