Reputation: 3900
I'm confused as to how to use DynamoDB table keys. The documentation mentions HASH (which seem to also be referred to as Partition) keys and RANGE (or SORT?) keys. I'm trying to roughly align these with my previous understanding of database indexing theories.
My current, mostly guess-based understanding is that a HASH key is essentially a primary key - it must be unique and is automatically indexed for fast-reading - and a RANGE key is basically something you should apply to any other field you plan on querying on (either in a WHERE-like or sorting context).
This is then somewhat confused by the introductions of Local and Global Secondary Indexes. How do they play into things?
If anyone could nudge me in the right direction, bearing in mind my current, probably flawed understanding has come from the docs, I'd be super grateful.
Thanks!
Upvotes: 7
Views: 4190
Reputation: 39186
Basically, the DynamoDB table is partitioned based on partition key (otherwise called hash key).
1) If the table has only partition key, then it has to be unique. The DynamoDB table performance based pretty much on the partition key. The good partition key should be a well scattered value (should not have a sequence number as partition key like RDBMS primary key in legacy systems).
2) If the table has both partition key and sort key (otherwise called RANGE key), then the combination of them needs to be unique. It is a kind of concatenation key in RDBMS terms.
However, the usage differs in DynamoDB table. DynamoDB doesn't have a sorting functionality (i.e. ORDER BY clause) across the partition keys. For example, if you have 10 items with same partition key value and different sort key values, then you can sort the result based on the sort key attribute. You can't apply sorting on any other attributes including partition key.
All sort key values of a partition key will be maintained in the same partition for better performance (i.e. physically co-located).
LSI - There can be only one LSI for the table. It should be defined when you create the table. This is kind of alternate sort key for the table
GSI - In order to understand GSI, you need to understand the difference between SCAN and QUERY API in DynamoDB.
SCAN - is used when you don't know the partition key (i.e. full table scan to get the item)
QUERY - is used when you know the partition key (i.e. sort key is optional)
As DynamoDB costing is based on read/write capacity units and for better performance, scan is not the best option for most of the use cases. So, there is an option to create the GSI with alternate partition keys based on the Query Access Pattern (QAP).
Upvotes: 10