Reputation: 976
Lets say I have a Dynamo DB table named 'student_course'. I want to store the course that each student is taking in a University. One student can take multiple courses at a time and one course can have many student at a time. So basically it is a multi-mapping.
My data access pattern has only one use case -
To achieve this I can store the data in these 2 ways -
My Question is - Which query would perform better, if there is any difference, that is? Which one should I choose over the other and why?
Upvotes: 5
Views: 2658
Reputation: 23443
In terms of designing for DDB
Performance, the Get API
is key to milli-seconds data retrieval capability of DDB, therefore it is logical to design your data around this API
Table with Partition Key + Sort Key
Partition Key | Sort Key
--------------+-------------
Course1 | Student1
Course1 | Student2
Advantage:
Get API
to get a Single record by the Partition Key
and the Sort Key
e.g. Get single record where Partition Key = "Course1" and Sort Key = "Student1"Get API
to get a List of records by only the Partition Key
e.g. Get all records where Partition Key = "Course1"Disadvantage:
Sort Key
(i.e. Student) and not the Partition Key
(i.e. Course), you won't be able to use the Get API
to retrieve the record only by the Sort Key
Note: In general, the efficiency (such that queries do not hit the ReadThroughput
exception "Roof" easily) of DDB Get API
queries is quite coupled with the uniqueness and distribution of the Partition Key
. The more the partition keys you have and spreaded out, the better the performance
Table with Partition Key only
Partition Key Only
--------------------
Course1#Student1
Course1#Student2
Advantage:
Get API
to get a Single record by the Partition Key
e.g. Get single record where Partition Key = "Course1#Student1"Disadvantage:
Get API
to get a List of records using only a subset of the Partition key
e.g. Get a List of records where Partition Key = "Course1"About GSIs
Note: It is a common scenario to add Global Secondary Index on Tables to support Get API
calls using an alternate key, like Get a List of records where GSI Partition Key
= Course Name
Partition Key Only | Non Key Attribute (Course) For GSI
---------------------+---------------------------
Course1#Student1 | Course1
Course1#Student2 | Course1
You can have up to 20 GSI
Indexes (soft limit), with the option to remove this limit through a Support request
Partition Key Only | Non Key Attribute (Course) For GSI | Lecturer (For GSI 2)
---------------------+------------------------------------+---------------------
Course1#Student1 | Course1 | Lecturer1
Course1#Student2 | Course1 | Lecturer1
Conclusion
I would design a Table to have as many Unique values for the Partition Keys
as possible if the Performance is key i.e. Partition Key = Course1#Student1 VS Partition Key = Course1, Sort Key = Student1
Add GSIs
to Tables on-demand if you need to query by an alternate key
(Historically GSIs
were limited to 5 per Table, and had to be specified during Table creation, but these limitations are since lifted)
Upvotes: 5