Reputation: 2170
Coming from Azure’s DocumentDB (Cosmos db) background to AWS DynamoDB for a application where dynamo db is already being used.
I have a confusion around partition key on DynamoDB.
As of my understanding partition key is used segregate the data to different partitions when it grows, however many suggest using primary key as partition key, such as User Id, Customer Id, Order id. In which case I am not sure how we achieve better performance, as we have many partitions. So a query may need to be executed in multiple servers.
For an example, if I wanted to develop a multi-tenant system where I will use single table to store the all tenant’s data but partitioned using tenant id. I will do as mentioned below in document db.
Create objects with following schema.
Primary key: Order Id
Partition key: Tenant id
SELECT * FROM Orders o WHERE o.tenantId="tenantId"
SELECT * FROM Orders o WHERE o.Id='id' and o.tenantId="tenantId"
SELECT * FROM Orders o WHERE o.tenantId="tenantId" order by o.CreatedData
//by default all fields in document db are indexed, so order by just works
How do I achieve same operations in dynamo db?
Upvotes: 2
Views: 1363
Reputation: 8566
As of my understanding partition key is used segregate the data to different partitions when it grows, however many suggest using primary key as partition key, such as User Id, Customer Id, Order id. In which case I am not sure how we achieve better performance, as we have many partitions.
You are correct that the partition key is used in DynamoDB to segregate data to different partitions. However partition key and physical partition in which items recides is not a one-to-one mapping.
The number of partitions is decided based on your RCU/WCU in such a way that all RCU/WCU can be utilized.
In dynamo db Primary keys are not need to be unique. I understand this is very confusing as compare to all other products out there, but this is the fact. Primary keys (in dynamoDB) are actually "Partition key".
This is a wrong understanding. The concept of the primary key is exactly the same as SQL standards with extra restrictions as you would expect a NoSQL database to have. In short, you can have a partition key as a primary key or partition key and sort key as a composite primary key.
Upvotes: 1
Reputation: 2170
Finally I have found how to use dynamodb properly.Thanks to [@Jesse Carter], his comment was so helpful to understand dynamo db better. I am answering my Own Question now.
Compared to other NoSQL db's DynamoDB is bit difficult as terms are too much confusing, below I have mentioned simplified dynamodb table design for few common scenarios.
In dynamo db Primary keys are not need to be unique, I understand this is very confusing as compare to all other products out there, but this is the fact. Primary keys (in dyanmodb) are actually "Partition key".
You always required to supply Primary key as part of query
Assume you want to create a table with Id, and multiple other attributes. Also you query based on Id attribute only. in this case Id could be a Primary key.
|---------------------|------------------|
| User Id | Name |
|---------------------|------------------|
| 12 | value1 |
| 13 | value2 |
|---------------------|------------------|
We can have User Id as "Primary Key (Partition Key)"
Now say we want to store messages for users as shown below, and we will query by user id to retrieve all messages for user.
|---------------------|------------------|
| User Id | Message Id |
|---------------------|------------------|
| 12 | M1 |
| 12 | M2 |
| 13 | M3 |
|---------------------|------------------|
Still "User Id" shall be a primary key for this table. Please remember Primary key in dynamodb not need to be unique per document. Message Id can be Sort key
Sort key is a kind of unique key within a partition. Combination of Partition key, and Sort key has to be unique.
If you are using Visual Studio, you can install AWS tool kit for visual studio to create Local tables on your machine for testing.
Hash key, Range key. Always surprises with dynamo db isn't? :) . Actually
(Primary Key = Partition Key = Hash Key) != Your application objects primary key
As per our second scenario "Message Id" suppose to be primary key for our application, however as per DynamoDB terms user Id became a primary key to achieve partition benefits.
(Sort key = Range Key) = Could be a application objects primary
We can create indexes within partition and that is called local secondary index. For example if we want retrieve messages for user based on message status
|------------|--------------|------------|
| User Id | Message Id | Status |
|------------|--------------|------------|
| 12 | M1 | 1 |
| 12 | M2 | 0 |
| 13 | M3 | 2 |
|------------|--------------|------------|
Primary Key: User Id
Sort Key: Message Id
Secondary Local Index: Status
As the name states it is a global index. If we want to retrieve single message based on id, without partition key i.e. user id. Then we shall create a global index based on Message id.
Upvotes: 2
Reputation: 8571
Please see the explanantion from AWS documentation,
The primary key uniquely identifies each item in a table. The primary key can be simple (partition key) or composite (partition key and sort key).
When it stores data, DynamoDB divides a table's items into multiple partitions, and distributes the data primarily based upon the partition key value. Consequently, to achieve the full amount of request throughput you have provisioned for a table, keep your workload spread evenly across the partition key values. Distributing requests across partition key values distributes the requests across partitions.
For example, if a table has a very small number of heavily accessed partition key values, possibly even a single very heavily used partition key value, request traffic is concentrated on a small number of partitions – potentially only one partition. If the workload is heavily unbalanced, meaning that it is disproportionately focused on one or a few partitions, the requests will not achieve the overall provisioned throughput level. To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.
This does not mean that you must access all of the partition key values to achieve your throughput level; nor does it mean that the percentage of accessed partition key values needs to be high. However, be aware that when your workload accesses more distinct partition key values, those requests will be spread out across the partitioned space in a manner that better utilizes your allocated throughput level. In general, you will utilize your throughput more efficiently as the ratio of partition key values accessed to the total number of partition key values in a table grows.
Upvotes: 1