user1204395
user1204395

Reputation: 568

DynamoDB hierarchical data and condition table design

I'm having problems designing my DynamoDB table to support a fairly simple access pattern. I hope you can help me a little bit :)

I have 4 different products types (A, B, C and D) that have a price and a location (country#state#city).

The access patterns are:

  1. Filter by product type
  2. Filter by product type and location
  3. Filter by product type, location and price
  4. Filter by product type, location and price and sort by price

The problem is that a product located in USA#NY#NY must be also available in USA#NY and USA. Also, the user needs to be able to filter by price and sort by expensive/cheap first.

Example:

Data:

Product: ID_1 | Type A | Location USA#NY#NY | Price 100$
Product: ID_2 | Type A | Location USA#NY#NY | Price 200$

Use case:

Expected result:

Both products should be displayed to both users in the right order even though they search in different areas.


To be able to filter by location and price and sort by price I came up with this solution, however, a lot of data gets duplicated and I am sure there must be a much better solution:

PK |     SK & GSI PK     | GSI SK | Other product details (duplicated data)
---------------------------------------------------------------------------
ID | TYPE                | PRICE  | Image, name, etc
ID | TYPE#USA            | PRICE  | Image, name, etc
ID | TYPE#USA#NY         | PRICE  | Image, name, etc
ID | TYPE#USA#NY#NY      | PRICE  | Image, name, etc

This solves every access pattern:

  1. Filter by product type

    GSI PK = TYPE

  2. Filter by product type and location

    GSI PK = begins_with(TYPE#USA#NY#...)

  3. Filter by product type, location and price

    GSI PK = TYPE#USA & GSI SK > 150

    GSI PK = TYPE#USA#NY & GSI SK > 150

  4. Filter by product type, location and price and sort by price

    GSI PK = TYPE#USA & GSI SK > 150 ScanIndexForward true/false

    GSI PK = TYPE#USA#NY & GSI SK > 150 ScanIndexForward true/false

Reads are efficient but a lot of data gets duplicated (price and product details) and updating an item requires multiple writes.


Is it possible to achieve this without duplicating all the product details?

Upvotes: 4

Views: 672

Answers (1)

Charles
Charles

Reputation: 23783

I think you're misunderstanding the hierarchical pattern

You only need
Table
PK = ID

GSI
PK = TYPE
SK = COUNTRY#STATE#CITY

You can then query the GSI with

  • Query(GSI, PK = 'TYPEA')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#')
  • Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#')

Filtering on price could be added to any of the above queries.
Query(GSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#', filter price > 100.00)

Note that filtering this way does not save any read capacity, it may be more effective to simply filter client side.

The point is you only need (and are allowed) 1 row in the GSI per row in the table

The only question, do you expect more than 10GB of data for any of the types? A GSI, like DDB table itself has a 10GB limit per partition.
(2022-11 Update) DDB supports more than 10GB of data for a given partition key, if there are no Local Secondary Indexes. Additionally, GSI indexes do not have the 10GB partition limit.

If you do expect more than 10GB per type, I'd reconsider the idea that type is a valid access pattern. Nobody is going to scroll through 10GB of data.

Honestly, I'm not a fan of DDB tables that don't handle at least one of the anticipated access patterns. If type is that big of a driver, I'd consider a table like so:
Table
PK = TYPE
SK = ID

LSI
PK = (same as table)
SK = COUNTRY#STATE#CITY

Now your queries become

  • Query(table, PK = 'TYPEA')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#NY#')
  • Query(LSI, PK = 'TYPEA', SK begins with 'USA#NY#NY#')

and you don't have to pay extra for a GSI.

EDIT
When considering the price filter, is it really going to make a big difference in performance or cost. You pay for each 1MB RCU of data regardless of rather that data returned is 1 row or 100 rows. So just how big are your rows and how big a spread in price to you expect to filter across?

Have you considered other options besides or in addition to dynamo? Aurora RDS certainly would provide the flexibility in query that you seem to need much, much easier. Or perhaps adding Elasticsearch in addition to DDB.

Upvotes: 3

Related Questions