Reputation: 413
I read this answer, which clarified a lot of things, but I'm still confused about how I should go about designing my primary key.
First off I want to clarify the idea of WCUs. I get that WCU is the write capacity of max 1kb per second. Does it mean that if writing a piece of data takes 0.25 seconds, I would need 4 of those to be billed 1 WCU? Or each time I write something it consumes 1 WCU, but I could also write X times within 1 second and still be billed 1 WCU?
Usage
I want to create a table that stores the form data for a set of gyms (95% will be waivers, the rest will be incidents reports). Most of the time, each forms will be accessed directly via its unique ID. I also want to query the forms by date, form, userId, etc..
We can assume an average of 50k forms per gym
Options
First option is straight forward: having the formId be the partition key. What I don't like about this option is that scan operations will always filter out 90% of the data (i.e. the forms from other gyms), which isn't good for RCUs.
Second option is that I would make the gymId the partition key, and add a sort key for the date, formId, userId. To implement this option I would need to know more about the implications of having 50k records on one partition key.
Third option is to have one table per gyms and have the formId as partition key. This seems to be like the best option for now, but I don't really like the idea of having a a large number of tables doing the same thing in my account.
Is there another option? Which one of the three is better?
Edit: I'm assuming another option would be SimpleDB?
Upvotes: 0
Views: 745
Reputation: 4865
For your PK design. What data does the app have when a user is going to look for a form? Does it have the GymID, userID, and formID? If so, make a compound key out of that for the PK perhaps? So your PK might look like:
234455::53894302::245
Where 23445 is the GymID, 53894302 is the user's ID, and 245 is the form ID. You might even move the form ID to the sort key and along with a date, you could have an SK of form::245::. Then you could easily get all items of type form for that user, or all form 245s for that user. or all form 245s in 2020 for that user, by using the begins_with() expression in your QUERY.
This might not be an exactly what you should do, but play with it and see what options you come up with. One thing to think about is what happens when a user moves gyms? perhaps in that rare event, you rewrite all of their items in the DB with the new gymID. Perhaps you do not have the gymID in the PK. without a lot more info, it is difficult to say. Hopefully this is enough for you to chew on so you can come up a solution.
Upvotes: 1
Reputation: 23823
Every call that writes to DDB consumes at least 1 (standard) or 2 (transactional) WCUs. Assuming your items are less the 1KB in size.
See Provisioned Throughput key point
Item sizes for writes are rounded up to the next 1 KB multiple. For example, writing a 500-byte item consumes the same throughput as writing a 1 KB item.
So writing 4 items in one second will require 4 WCU. But "burst" mode means you might temporarily be able to write 4 items a second for a short period of time in a table that's only provisioned for 2 WCU.
As far as your proposed options. It depends. You mentioned some general access patterns, but not specifics nor if those are the only ones you need.
In an RDBMS, you have to know ahead of time how you want to store the data. But accessing that data is very flexible.
In DDB, you have to know how you need to access the data, but the storage structure is flexible.
Some general feedback:
Assuming you actually have mutli-tenants, ie. each gym is an individual customer. Then I'd lean toward having gymid be the hash key so I could take advantage of enforcing tenant isolation via IAM roles as outlined in this article.
Cavet: This could be problematic if tenants are NOT of approximately the same size. But again, less of a problem than it originally was.
Upvotes: 1