Shubham Kanodia
Shubham Kanodia

Reputation: 6236

DynamoDB: Fetch all items that match one of possible keys

Say I have the following table —

itemA | itemB | relationScore | type
----------------------------------------
"s"   |   "a" |      1.0      |  "foo"
"s"   |   "b" |      1.0      |  "bar"
...
 

Here, itemA + itemB is unique per row, and I've made them the hash key and range key respectively.

My query, however, needs me to fetch all items such that —

  1. itemA is equal to one of (A,B,C,D ....) (i.e. a list of options)
  2. type is equal to "foo"

How do I build indexes to be able to do this without using scans / requiring multiple queries?

Note: I don't want to query on just type and filter it later in memory because type has very low cardinality, and would end up returning a humongous list back.

Upvotes: 0

Views: 273

Answers (2)

Seth Geoghegan
Seth Geoghegan

Reputation: 5747

For your first access pattern, DynamoDB provides a BatchGetItem operation which can return up to 16MB of data and contain as much as 100 items.

Your second access pattern can be accessed by creating a secondary index on the type field. This index would logically group all items of the same type into the same partition, which you could retrieve with a single query operation.

Edit: I misinterpreted the question and thought we were discussing two separate access patterns, not a single access pattern. As described, the existing data model won't support a single query operation.

The only way to search across partition keys (condition 1) is by a scan operation, or multiple queries. If you want to do this in a single query, you'll need to store your data such that the list of options and type is grouped together in a single partition. This is effectively "pre joining" your data.

Unless we can exploit features of your data that are not obvious via your example (e.g. itemA is lexicographically sortable), you are going to be stuck with multiple queries or a scan operation.

Upvotes: 0

Charles
Charles

Reputation: 23823

Maybe...

If you created a GSI with
hash key: type
range key: itemA

Then you could query using the GSI and (type = "foo", itemA between "A" and "D")

But obviously that requires itemA values to be a contiguous range. Which seems to be the case for your example, but may not be the case for the actual data.

EDIT
Since the itemA values aren't actually contiguous, and DDB doesn't support IN you're stuck with multiple queries.

This isn't the end of the world, as you could do the queries in parallel. In that case, I'd probably have the GSI with
hash key: itemA
range key: type

thus ensuring that each query is partition specific. (Even if your data or I/O requires are low enough that DDB doesn't actually create individual partitions)

Upvotes: 1

Related Questions