Clinton Bosch
Clinton Bosch

Reputation: 2589

Querying DynamoDB table for all items with date value within range

I have a table that contains 'match' objects, In short a match has a matchId which is unique, a date (as well as a number of other columns which are not important for this question). When creating the table I created it with a single primary key being the matchId.

Now I would like to query the table for all matches within a certain date range.

I am pretty certain I need to set up a GSI and I also understand I need to make the date field the range key since I need to use a BETWEEN operator which does not work on the primary key (Only equals on primary key).

However I am not sure what to make the primary key in this case and then how this works since I think I need to define the primary key as part of the HASH in order to query.

I come from a relationalDB world where this would be so easy and I am sure I am missing a fundamental concept here

Upvotes: 0

Views: 993

Answers (1)

fedonev
fedonev

Reputation: 25799

A secondary index gives you an alternate Primary Key to query on. To query anything in DynamoDB (index or table), you must specify one-and-only-one Partition Key value and optionally an expression for the Sort Key (together, the Primary Key).

One way to query all records by date is to define a Global Secondary Index with the date as the index's PK and matchId as the SK. You need both to ensure item uniqueness.

GSI1PK        GSI1SK
2022-05-31    matchId_1
2022-05-31    matchId_2
2022-06-01    matchId_1

You would execute one query operation per day in your date range.

There are other patterns to consider. If you had a time component in your data, you could optionally prepend it to the SK (e.g. HH:MM:SS#matchId) to optionally permit from-to time querying within a day. Alternatively, an index PK like YYYY-MM and SK like YYYY-MM-DD#matchId might make sense to reduce the number of queries needed for wide date ranges, at the cost of lower-cardinality partitions. The docs offer some best practices for time series data, although it's not directly applicable in your case. The optimal design depends on your query patterns and data/query volumes.

Upvotes: 1

Related Questions