Harry
Harry

Reputation: 4835

Querying nested attributes in Amazon DynamoDB

How can I efficiently query on nested attributes in Amazon DynamoDB?

I have a document structure as below, which lets me store related information in the document itself (rather than referencing it).

It makes sense to store the seminars nested in the course, since they will likely be queried alongside the course (they are all course-specific, i.e. a course has many seminars, and a seminar belongs to a course).

In CouchDB, which I’m migrating from, I could write a View that would project some nested attributes for querying. I understand that I can’t project anything that isn’t a top-level attribute into a dynamodb secondary index, so this approach doesn’t seem to work.

This brings me back to the question: how can I efficiently query on nested attributes without scanning, if I can’t use them as keys in an index?

For example, if I want to get average attendance at Nelson Mandela Theatre, how can I query for the values of registrations and attendees in all seminars that have a location of “Nelson Mandela Theatre” without resorting to a scan?

{
    “course_id”: “ABC-1234567”,
    “course_name”: “Statistics 101”,
    “tutors”: [“Cognito-sub-1”, “Cognito-sub-2”],
    “seminars”: [ 
        {
            “seminar_id”: “XXXYYY-12345”,
            “epoch_time”: “123456789”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “How to lie with statistics”,
            “registrations”: “92”,                
            “attendees”: “61”
        },
        {
            “seminar_id”: “BBBCCC-44444”,
            “epoch_time”: “155555555”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “Statistical significance for dog owners”,
            “registrations”: “244”,
            “attendees”: “240”
        },
        {
            “seminar_id”: “XXXAAA-54321”,
            “epoch_time”: “223456789”,
            “duration”: “4000”,
            “location”: “Starbucks”,
            “name”: “Is feral cat population growth a leading indicator for the S&P 500?”,
            “registrations”: “40”                
        }
    ]
}

{
    “course_id”: “CJX-5553389”,
    “course_name”: “Cat Health 101”,
    “tutors”: [“Cognito-sub-4”, “Cognito-sub-9”],
    “seminars”: [ 
        {
            “seminar_id”: “TTRHJK-43278”,
            “epoch_time”: “123456789”,
            “duration”: “5400”,
            “location”: “Catwoman Hall”,
            “name”: “Emotional support octopi for cats”,
            “registrations”: “88”, 
            “attendees”: “87”
        },
        {
            “seminar_id”: “BBBCCC-44444”,
            “epoch_time”: “123666789”,
            “duration”: “5400”,
            “location”: “Nelson Mandela Theatre”,
            “name”: “Statistical significance for cat owners”,
            “registrations”: “44”,
            “attendees”: “44”
        }
    ]
}

Upvotes: 18

Views: 22701

Answers (5)

Vijay Kumar Attri
Vijay Kumar Attri

Reputation: 21

You can use document paths to filter the values. Use seminars.location as the document path.

Upvotes: 0

CCarlos
CCarlos

Reputation: 153

This is an example from here where you use a filter expression, it is with a scan operation, but maybe you can apply something similar for query instead of scan (take a look at the API):

{
    "TableName": "MyTable",
    "FilterExpression": "#k_Compatible.#k_RAM = :v_Compatible_RAM",
    "ExpressionAttributeNames": {
        "#k_Compatible": "Compatible",
        "#k_RAM": "RAM"
    },
    "ExpressionAttributeValues": {
        ":v_Compatible_RAM": "RAM1"
    }
}

Upvotes: 2

I have not such experience with DynamoDB yet but started setudying it since I'm planning on use it for my next project.

As far as I could understand from AWS documentation, the answer to your question is: it's not possible to efficiently query on nested attributes.

Looking at Best Practices, spetially Best Practices for Using Secondary Indexes in DynamoDB, it's possible to understand that the right approach should be using diffent line types under the same Partition Key as shown here. Then under the same course_id you would have a generic sorting key(sk). The first register would then have sk = 'Details' with course's data, then other registers like "seminar-1" and it's data, and so on. You would then set seminar's properties you would like to query as SGI (Secondary Global Index) bearing in mind that it can only have 5 SGI per table.

Hope it helps.

Upvotes: 0

Geetanshu Gulati
Geetanshu Gulati

Reputation: 772

You can do one thing to make it working on Scan Store the object in stringify format like { "language": "[{\"language\":\"Male\",\"proficiency\":\"Female\"}]" }`` and then can perform scan operation language: { contains: "Male" }

on client side you can perform JSON.parse(language)

Upvotes: 0

notionquest
notionquest

Reputation: 39166

Index cannot be created for nested attributes (i.e. document data types in Dynamodb).

Document Types – A document type can represent a complex structure with nested attributes—such as you would find in a JSON document. The document types are list and map.

Query Api:-

A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.

Scan API:-

A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.

In order to use Query API, the hash key value is required. The OP doesn't have any information that hash key value is available. As per OP, the data needs to be queried by location attribute which is inside the Dynamodb List data type. Now, the option is to look at GSI.

Kindly read more about the GSI. One of the rules is that GSI can be created using top level attributes only. So, the location can't be used to create the index.

So, creating the GSI in order to use Query API has been ruled out as well.

The index key attributes can consist of any top-level String, Number, or Binary attributes from the base table; other scalar types, document types, and set types are not allowed.

Because of the above mentioned reasons, the Query API can't be used to get the data based on location attribute assuming hash key value is not available.

If hash key value is available, FilterExpression can be used to filter the data. Only way to filter the data present in the complex list data type is CONTAINS function. In order to use CONTAINS function, all the attributes in the occurrence is required to match the data (i.e. seminar_id, location, duration and all other attributes). So, it is definitely not possible to fulfil the use case mentioned in the OP using the current data model.

Proposed alternate solution:-

Re-modeling the data structure as mentioned below could be an option to resolve the problem. There is definitely no other solution available to fulfil the use case using Query API.

Main Table :-

Course Id - Hash Key

seminar_id - Sort Key

GSI :-

Seminar location - Hash Key

Course Id - Sort Key

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.

Now, you can use the Query API on GSI to get the data for Seminar location is equal to Nelson Mandela Theatre. You can use the course id in the query api if you know the value. The query api will potentially give multiple items in the result set. You can use FilterExpression if you would like to further filter the data based on some non key attributes.

Upvotes: 13

Related Questions