Kevin Glick
Kevin Glick

Reputation: 157

Query for Latest Item & Proper Use of Partition Keys in DynamoDB

I am creating a DynamoDB table to support an Alexa Skill for use as a podcast player. The way I envision the table is to use the episode number as the Partition Key and the PublicationDate as the optional Sort Key. I have two concerns about designing my table schema in this way.

First, say I wanted to query the table to get the latest episode - I'm not sure that I can do it in this fashion, as a query requires an equivalence operation on the Partition Key (episode = X), which I wouldn't know in advance. Am I correct in believing that a scan would be quite an expensive operation if the podcast has a large number of episodes (say more than 1000)?

I would need to look at each item in the table, compare its episode number (Partition Key value) to the previous returned Item and update a variable with the more recent Item each time one was found until all Items in the table were cycled through in this way.

Secondly, DynamoDB best practices say two things which work incongruently in my use-case (probably a sign that my design is flawed). First, the Partition Key should be unique or close to unique. Second, queries should be expected to be more or less uniformly dispersed amongst the keys. In my case, though, while the Partition Key would indeed be unique, I would expect the vast majority of queries to be targeting the latest Partition Key in the table, for the Item containing data for the latest podcast episode. What would be the impact on performance if, say for example, the skill gets 1000 queries on any given day all aimed at a single Partition Key?

Does anyone have a better table architecture solution for this type of data?

Thanks to everyone in advance!

Upvotes: 1

Views: 464

Answers (1)

Necevil
Necevil

Reputation: 2912

Question 1:

First, say I wanted to query the table to get the latest episode - I'm not sure that I can do it in this fashion, as a query requires an equivalence operation on the Partition Key (episode = X), which I wouldn't know in advance. Am I correct in believing that a scan would be quite an expensive operation if the podcast has a large number of episodes (say more than 1000)?

You are right that you would NOT be able to query for the latest episode because each episode is in their own Partition. Partitions are almost like different isolated tables so there is no way to query across all Partitions without Scanning (as you said).

Question 2:

Secondly, DynamoDB best practices say two things which work incongruently in my use-case (probably a sign that my design is flawed). First, the Partition Key should be unique or close to unique. Second, queries should be expected to be more or less uniformly dispersed amongst the keys. In my case, though, while the Partition Key would indeed be unique, I would expect the vast majority of queries to be targeting the latest Partition Key in the table, for the Item containing data for the latest podcast episode. What would be the impact on performance if, say for example, the skill gets 1000 queries on any given day all aimed at a single Partition Key?

The issue here is two fold, AWS expects you to be reading (and writing) equally to each partition (or close to equally) so basically what is going to happen is you are going to pay for Write Units (and Read Units) on the partitions you are NOT using, even though you are not using them.

Exactly how much more that is going to run you is going to depend on the number of times you QUERY the database, however, Reading is much cheaper than writing and 1000 reads is basically nothing on a table with 1000 items. ie. You MIGHT be able to get away with it but it's not ideal.

Alternate Table Schema / Key Design

  1. What other Queries will you make? ie. other than "Check for latest Episode"
  2. How many Podcasts are added per day? week? year?
  3. Are there multiple 'shows' or categories that could be used for Partition Keys that might have more even distribution and could be 'known'?

Upvotes: 3

Related Questions