Reputation: 43
We are considering DynamoDB for an expectedly large dataset. I come from a strong SQL background so the No-SQL way of thinking is new to me.
I have a problem and design, but ran into what appears to be a dead end.
The documentation says to make sure your Hash keys are widely distributed to aid in performance, okay that makes sense.
I am going to be recording various datapoints/actions for users. It makes sense to me that the hash key should be the user-id, and my range key can be the action(s) performed.
Now, if I want all the actions user #1 performs, I can easily query that.
But, if I want all the USERS who performed action X, I cannot do that without a table scan. From the Query documentation:
A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.
So it would seem I am limited to getting data from a specific user, unless I am willing to do a table scan, which is slower and consumes many capacity units.
My question is, I think, ultimately a design question. Maybe I am missing something when it comes to No-SQL? Should my hash key be something else? Or is it simply that my requirements do not fit in with No-SQL (and more specifically, DynamoDB)?
It is almost as if the hash key is a kind of grouping with DynamoDB. I considered changing the hash key to the actions we are intending to put into place, but then I am not widely distributing my keys...
Upvotes: 3
Views: 1955
Reputation: 29
I guess the global secondary indexes option is better, as you get a single table.
Creating two tables will create redundancy and additional work to maintain consistency when doing any CUD (Create, Update, Delete) operation on any one table.
Upvotes: 1
Reputation: 360
You must create a Global Secondary Index (GSI). What this does is it creates a second pair of hash and range keys which differ from the original keys. You can then query the same table by also including an index name in your parameters.
Example in JS:
var table = tablename;
var index = actionId-username-gsi;
var action = actionId;
var params = {
TableName : table,
IndexName : index,
KeyConditionExpression : 'actionId = :v_actionId',
ExpressionAttributeValues : {
':v_actionId': { N : action }
},
ProjectionExpression : 'actionId, username'
};
ddb.query(params, err) {
if(err) {
// Oh well
} else {
// Do something
}
};
This will query the actionId-username-gsi index and look for any actionId hashes with the value provided. Using ProjectionExpression will return only the specified attributes' values for each item, lowering throughput if that ever becomes a concern. I hope this helps answer your question.
node.js aws amazon-dynamodb nosql
Upvotes: 1
Reputation: 14751
The DynamoDb way to meet your requirement to allow both types of queries is to store the data in two tables, one with hash key user-id and range key action-id, and one with hash key action-id and range key user-id.
And you should think about if you need all the data in both tables, or if one can be a summary table. For example, say you have a limited number of possible actions. Instead of putting the full record of every action in the user-keyed table, you might want a table with only one row for each user: a hash key of user - id, and a second column that is multiply valued and is a list of any action-id that the user has performed at least once.
Upvotes: 2