Reputation: 2170

Partition in Dynamodb

Coming from Azure’s DocumentDB (Cosmos db) background to AWS DynamoDB for a application where dynamo db is already being used.

I have a confusion around partition key on DynamoDB.

As of my understanding partition key is used segregate the data to different partitions when it grows, however many suggest using primary key as partition key, such as User Id, Customer Id, Order id. In which case I am not sure how we achieve better performance, as we have many partitions. So a query may need to be executed in multiple servers.

For an example, if I wanted to develop a multi-tenant system where I will use single table to store the all tenant’s data but partitioned using tenant id. I will do as mentioned below in document db.

1) Storing data

Create objects with following schema.

Primary key: Order Id
Partition key: Tenant id

2) Retrieving all records for a tenants

SELECT * FROM Orders o WHERE  o.tenantId="tenantId"

3) Retrieving a record by id for a tenant

SELECT * FROM Orders o WHERE o.Id='id' and o.tenantId="tenantId"

4) Retrieving all records for a tenant with sorting

 SELECT * FROM Orders o WHERE  o.tenantId="tenantId" order by o.CreatedData
 //by default all fields in document db are indexed, so order by just works

How do I achieve same operations in dynamo db?

Upvotes: 2

Answers (3)

Mayank Patel

Reputation: 8566

As of my understanding partition key is used segregate the data to different partitions when it grows, however many suggest using primary key as partition key, such as User Id, Customer Id, Order id. In which case I am not sure how we achieve better performance, as we have many partitions.

You are correct that the partition key is used in DynamoDB to segregate data to different partitions. However partition key and physical partition in which items recides is not a one-to-one mapping.

The number of partitions is decided based on your RCU/WCU in such a way that all RCU/WCU can be utilized.

In dynamo db Primary keys are not need to be unique. I understand this is very confusing as compare to all other products out there, but this is the fact. Primary keys (in dynamoDB) are actually "Partition key".

This is a wrong understanding. The concept of the primary key is exactly the same as SQL standards with extra restrictions as you would expect a NoSQL database to have. In short, you can have a partition key as a primary key or partition key and sort key as a composite primary key.

Upvotes: 1

CreativeManix

Reputation: 2170

Finally I have found how to use dynamodb properly.Thanks to [@Jesse Carter], his comment was so helpful to understand dynamo db better. I am answering my Own Question now.

Compared to other NoSQL db's DynamoDB is bit difficult as terms are too much confusing, below I have mentioned simplified dynamodb table design for few common scenarios.

Primary key

In dynamo db Primary keys are not need to be unique, I understand this is very confusing as compare to all other products out there, but this is the fact. Primary keys (in dyanmodb) are actually "Partition key".

Finding 1

You always required to supply Primary key as part of query

Scenario 1 - Key value(s) store

Assume you want to create a table with Id, and multiple other attributes. Also you query based on Id attribute only. in this case Id could be a Primary key.

|---------------------|------------------|
|      User Id        |       Name       |
|---------------------|------------------|
|          12         |      value1      |
|          13         |      value2      |
|---------------------|------------------|

We can have User Id as "Primary Key (Partition Key)"

Scenario 2

Now say we want to store messages for users as shown below, and we will query by user id to retrieve all messages for user.

|---------------------|------------------|
|      User Id        |       Message Id |
|---------------------|------------------|
|          12         |      M1          |
|          12         |      M2          |
|          13         |      M3          |
|---------------------|------------------|

Still "User Id" shall be a primary key for this table. Please remember Primary key in dynamodb not need to be unique per document. Message Id can be Sort key

So what is Sort key.

Sort key is a kind of unique key within a partition. Combination of Partition key, and Sort key has to be unique.

Creating table locally

If you are using Visual Studio, you can install AWS tool kit for visual studio to create Local tables on your machine for testing.

Note: The above Image adds some more terms!!.

Hash key, Range key. Always surprises with dynamo db isn't? :) . Actually

(Primary Key = Partition Key = Hash Key) != Your application objects primary key

As per our second scenario "Message Id" suppose to be primary key for our application, however as per DynamoDB terms user Id became a primary key to achieve partition benefits.

(Sort key = Range Key) = Could be a application objects primary

Local Secondary Indexes

We can create indexes within partition and that is called local secondary index. For example if we want retrieve messages for user based on message status

|------------|--------------|------------|
|  User Id   |   Message Id |   Status   |
|------------|--------------|------------|
|     12     |      M1      |     1      |
|     12     |      M2      |     0      |
|     13     |      M3      |     2      |
|------------|--------------|------------|

Primary Key: User Id

Sort Key: Message Id

Secondary Local Index: Status

Global Secondary Indexes

As the name states it is a global index. If we want to retrieve single message based on id, without partition key i.e. user id. Then we shall create a global index based on Message id.

Upvotes: 2

Vijayanath Viswanathan

Reputation: 8571

Please see the explanantion from AWS documentation,

The primary key uniquely identifies each item in a table. The primary key can be simple (partition key) or composite (partition key and sort key).

When it stores data, DynamoDB divides a table's items into multiple partitions, and distributes the data primarily based upon the partition key value. Consequently, to achieve the full amount of request throughput you have provisioned for a table, keep your workload spread evenly across the partition key values. Distributing requests across partition key values distributes the requests across partitions.

For example, if a table has a very small number of heavily accessed partition key values, possibly even a single very heavily used partition key value, request traffic is concentrated on a small number of partitions – potentially only one partition. If the workload is heavily unbalanced, meaning that it is disproportionately focused on one or a few partitions, the requests will not achieve the overall provisioned throughput level. To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.

This does not mean that you must access all of the partition key values to achieve your throughput level; nor does it mean that the percentage of accessed partition key values needs to be high. However, be aware that when your workload accesses more distinct partition key values, those requests will be spread out across the partitioned space in a manner that better utilizes your allocated throughput level. In general, you will utilize your throughput more efficiently as the ratio of partition key values accessed to the total number of partition key values in a table grows.

Upvotes: 1

Partition in Dynamodb

1) Storing data

2) Retrieving all records for a tenants

3) Retrieving a record by id for a tenant

4) Retrieving all records for a tenant with sorting

Answers (3)

Primary key

Finding 1

Scenario 1 - Key value(s) store

Scenario 2

So what is Sort key.

Creating table locally

Note: The above Image adds some more terms!!.

Local Secondary Indexes

Global Secondary Indexes

Related Questions