Reputation: 6236
Say I have the following table —
itemA | itemB | relationScore | type
----------------------------------------
"s" | "a" | 1.0 | "foo"
"s" | "b" | 1.0 | "bar"
...
Here, itemA
+ itemB
is unique per row, and I've made them the hash key and range key respectively.
My query, however, needs me to fetch all items such that —
itemA
is equal to one of (A,B,C,D ....) (i.e. a list of options)type
is equal to "foo"How do I build indexes to be able to do this without using scans / requiring multiple queries?
Note: I don't want to query on just type
and filter it later in memory because type
has very low cardinality, and would end up returning a humongous list back.
Upvotes: 0
Views: 273
Reputation: 5747
For your first access pattern, DynamoDB provides a BatchGetItem operation which can return up to 16MB of data and contain as much as 100 items.
Your second access pattern can be accessed by creating a secondary index on the type
field. This index would logically group all items of the same type
into the same partition, which you could retrieve with a single query
operation.
Edit: I misinterpreted the question and thought we were discussing two separate access patterns, not a single access pattern. As described, the existing data model won't support a single query operation.
The only way to search across partition keys (condition 1) is by a scan
operation, or multiple queries. If you want to do this in a single query, you'll need to store your data such that the list of options and type is grouped together in a single partition. This is effectively "pre joining" your data.
Unless we can exploit features of your data that are not obvious via your example (e.g. itemA is lexicographically sortable), you are going to be stuck with multiple queries or a scan operation.
Upvotes: 0
Reputation: 23823
Maybe...
If you created a GSI with
hash key: type
range key: itemA
Then you could query using the GSI and (type = "foo", itemA between "A" and "D")
But obviously that requires itemA values to be a contiguous range. Which seems to be the case for your example, but may not be the case for the actual data.
EDIT
Since the itemA values aren't actually contiguous, and DDB doesn't support IN
you're stuck with multiple queries.
This isn't the end of the world, as you could do the queries in parallel. In that case, I'd probably have the GSI with
hash key: itemA
range key: type
thus ensuring that each query is partition specific. (Even if your data or I/O requires are low enough that DDB doesn't actually create individual partitions)
Upvotes: 1