Modelling Parent/Child/Subchild relationships in DynamoDB

Question

I'm developing the following rest services:

GET /parents - Get all parents
GET /parents/{id} - Get parent by id
POST /parents - Add new parent

GET /parents/{pid}/children - Get all children for parent pid
GET /parents/{pid}/children/{id} - Get child by id for parent pid
POST /parents/{pid}/children - Add new child for parent pid

GET /parents/{pid}/children/{cid}/subchildren - Get all subchildren for parent pid and child cid
GET /parents/{pid}/children/{cid}/subchildren{id} - Get subchildchild by id for parent pid and child cid
POST /parents/{pid}/children/{cid}/subchildren - Add new subchild for parent pid and child cid

I have to use DynamoDB (or some other serverless and cheap database option on AWS) for performing CRUD operations (Update and Delete are not a priority as of now). When GET /parents is called, the response will be:

[
  {
    "id": 1,
    "name": "parent1",
    "children": [
      {
        "id": 1,
        "name": "parent1_child1",
        "subchildren": [
          {
            "id": 1,
            "name": "parent1_child1_subchild1"
          },
          {
            "id": 2,
            "name": "parent1_child1_subchild2"
          }
        ]
      },
      {
        "id": 2,
        "name": "parent1_child2",
        "subchildren": [
          {
            "id": 1,
            "name": "parent1_child2_subchild1"
          },
          {
            "id": 2,
            "name": "parent1_child2_subchild2"
          },
          {
            "id": 3,
            "name": "parent1_child2_subchild3"
          }
        ]
      }
    ]
  },
  {
    "id": 2,
    "name": "parent2",
    "children": [
      {
        "id": 1,
        "name": "parent2_child1",
        "subchildren": [
          {
            "id": 1,
            "name": "parent2_child1_subchild1"
          }
        ]
      },
      {
        "id": 2,
        "name": "parent2_child2",
        "subchildren": [
          {
            "id": 1,
            "name": "parent2_child2_subchild1"
          },
          {
            "id": 2,
            "name": "parent2_child2_subchild2"
          },
          {
            "id": 3,
            "name": "parent2_child2_subchild3"
          },
          {
            "id": 4,
            "name": "parent2_child2_subchild4"
          }
        ]
      }
    ]
  }
]

How should I model the data in DynamoDB? I am using spring-data-dynamodb for performing storing and retrieving options. What is the best design in this case to create model classes?
Currently I have created different tables as mentioned in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SampleData.CreateTables.html#SampleData.CreateTables2

frsechet · Accepted Answer

You can do several things:

A: better for cost/performance but requires a bit of extra work

In your table, create a partition key PK and order key OK as both strings (you can use whatever name you want for the keys of course, this is just an example - my point being, the keys are not in your data model, you have to create them upon insert). When you create your items, don't store them as a one big document but store parents, children and subchildren separately. Each item gets as PK the value parent/child/subchild, and as OK the value that you have in each item's name as it has the right parentname_childname_subchildname kind of inheritance, or you can construct it from each item's parents properties.

Upon retrieval, you can get all subchildren of a given parent for example with an expression such as where PK = subchild and OK begins_with parentname_ which effectively returns all items that are subchilds and that belong to a given parent, no matter what the child is. Il you want to also filter on the child, you can do begins with parentname_childname_. If you want only the children of a parent, you can do where PK = child and OK begins_with parentname_. If you know precisely what child you want, instead of using begins_with you can simply do where PK = child and OK = parentname_childname. And so forth... (I'm using SQL-like syntax for ease of writing/reading but it's easy enough to convert to dynamodb query language). The idea is too differentiate your item's actual attributes and how you retrieve your item. You can create completely bogus keys if it helps you find your items easily!

You may not event need to differentiate the child "levels" in the PK, and you could as well use just a generic value there for similar results. The main objective of this method is to have an order key to use begins_with on in a particular partition key.

The main problem with this method is that once you have all your parents/children/subchildren as in your /parents example, you will have to reconstruct the tree entirely in code. But as far as dynamodb is concerned, this is probably the most optimized solution.

B: easier option using filters, but (much) worse cost/performance

Store each parent in a big document then scan your table with a filter like where item.children[].subchildren[].name = parentname_childname_subchildname. It's not quite as efficient and costs much more, as filters are applied on the results of the query so you are retrieving everything first then filtering only what you want, so you pay for all the items first then cut out those you don't want. It works but it's not great, especially if you want something like 5 items out of 100000.

Main advantage is that for /parents call, you get the whole hierarchy as described. But for subroutes /parents/x/children/... you will need to either reformat the object in your code or query only sub attributes in the projection expression.

C: using local secondary indexes

A bit like solution A where you store items separately, with the same PK for all items but with a local secondary index with 3 different order keys: name_parent, name_child, name_subchild. Then you can query each index separately depending on what you want. For this use case it's better to use LSIs than GSIs for cost optimization, but you are free to implement the same idea with GSIs. It will work just as well (with the same limitations about reformatting parent elements to contain children elements as well).

Modelling Parent/Child/Subchild relationships in DynamoDB

Answers (1)

Related Questions