AWS DynamoDB Partition Key Design

Question

I read this answer, which clarified a lot of things, but I'm still confused about how I should go about designing my primary key.

First off I want to clarify the idea of WCUs. I get that WCU is the write capacity of max 1kb per second. Does it mean that if writing a piece of data takes 0.25 seconds, I would need 4 of those to be billed 1 WCU? Or each time I write something it consumes 1 WCU, but I could also write X times within 1 second and still be billed 1 WCU?

Usage

I want to create a table that stores the form data for a set of gyms (95% will be waivers, the rest will be incidents reports). Most of the time, each forms will be accessed directly via its unique ID. I also want to query the forms by date, form, userId, etc..

We can assume an average of 50k forms per gym

Options

First option is straight forward: having the formId be the partition key. What I don't like about this option is that scan operations will always filter out 90% of the data (i.e. the forms from other gyms), which isn't good for RCUs.
Second option is that I would make the gymId the partition key, and add a sort key for the date, formId, userId. To implement this option I would need to know more about the implications of having 50k records on one partition key.
Third option is to have one table per gyms and have the formId as partition key. This seems to be like the best option for now, but I don't really like the idea of having a a large number of tables doing the same thing in my account.

Is there another option? Which one of the three is better?

Edit: I'm assuming another option would be SimpleDB?

Charles · Accepted Answer

Every call that writes to DDB consumes at least 1 (standard) or 2 (transactional) WCUs. Assuming your items are less the 1KB in size.

See Provisioned Throughput key point

Item sizes for writes are rounded up to the next 1 KB multiple. For example, writing a 500-byte item consumes the same throughput as writing a 1 KB item.

So writing 4 items in one second will require 4 WCU. But "burst" mode means you might temporarily be able to write 4 items a second for a short period of time in a table that's only provisioned for 2 WCU.

As far as your proposed options. It depends. You mentioned some general access patterns, but not specifics nor if those are the only ones you need.

In an RDBMS, you have to know ahead of time how you want to store the data. But accessing that data is very flexible.

In DDB, you have to know how you need to access the data, but the storage structure is flexible.

Some general feedback:

Scan() is an operation of last resort, it should be used very, very infrequently if at all.
50K records for a partition key isn't a big deal. What matters (but less than it used to) is how distributed your accesses are to each partition key. Ideally, you want a uniform distribution of access across all your partition keys.
one table per gym is a valid multi-tenant strategy. But there's management/overhead costs.

Assuming you actually have mutli-tenants, ie. each gym is an individual customer. Then I'd lean toward having gymid be the hash key so I could take advantage of enforcing tenant isolation via IAM roles as outlined in this article.
Cavet: This could be problematic if tenants are NOT of approximately the same size. But again, less of a problem than it originally was.

AWS DynamoDB Partition Key Design

Answers (2)

Related Questions