nan zang
nan zang

Reputation: 23

How to query sub document (list type) in Cosmos Db

I have a collection in Azure CosmosDB, and each document looks like this:

{
  "id": "random",
  "TeacherName": "Ben",
  "Students": [
    {
      "Name": "John",
      "Telephone": ""
    },
    {
      "Name": "Mary",
      "Telephone": ""
    }
  ],
}

TeacherName is string, Students is a list of student

I need to do: Given a user name (user1), query and return all the documents, either teacher name is "user1" or there is a student with name "user1".

I tried a few options, but cannot do it.

The closest solution I found so far is to use .SelectMany(), but I found .SelectMany will do a join and will duplicate the return results.

This is my query:

client.CreateDocumentQuery().SelectMany((x) => x.Students.Where(s=>s.Name == "user1" || x.TeacherName == "user1")   

If only the above sample document in the collection, and when I searched user name "Ben", 2 records will be returned ((number of result) * (number of students)). I have to remove the duplicate at client side and pagination is kind of broken.

Is it possible to issue a single query to achieve what I need?

Upvotes: 2

Views: 5281

Answers (1)

Jay Gong
Jay Gong

Reputation: 23782

client.CreateDocumentQuery().SelectMany((x) => x.Students.Where(s=>s.Name == "user1" || x.TeacherName == "user1")

Actually the SelectMany method in your code is similar to the sql as below:

SELECT c.id from c
join x  in c.Students
where c.TeacherName ='Ben' or x.Name = 'Ben'

Output

[
  {
    "id": "1"
  },
  {
    "id": "1"
  }
]

If there's a join, there's going to be duplicate data. As far as I know, automatic removal of duplicate data is not supported (like Distinct Keywords in traditional database) by Azure Cosmos DB yet.

It seems that there is no way to remove the repeat data in the query SQL level.

If you don't want to handle with your query result set in the loop locally, I strongly suggest you using a stored procedure to handle with result data in Azure Cosmos DB to release the pressure on your local server.

Or , you could complete the query directly in the Stored Procedure if your data is not too large.Please refer to the js code as below:

// SAMPLE STORED PROCEDURE
function sample(idsArrayString,courses) {
    var collection = getContext().getCollection();

    var isAccepted = collection.queryDocuments(
        collection.getSelfLink(),
        'SELECT * FROM root r',
        function (err, feed, options) {
            if (err) throw err;
            if (!feed || !feed.length) getContext().getResponse().setBody('no docs found');
            else {

                var returnResult = [];
                for(var i = 0;i<feed.length;i++){
                    var doc = feed[i];
                    if(doc.TeacherName == 'Ben'){
                        returnResult.push(feed[i]);
                        break;
                    }

                    var std = doc.Students;
                    for (var s in std) {
                        if(s.Name == 'Ben'){
                            returnResult.push(feed[i]);
                            break;
                        }
                    } 
                }
                getContext().getResponse().setBody(returnResult);
            }
        });

    if (!isAccepted) throw new Error('The query was not accepted by the server.');
}

Update Answer:

I checked the Azure Cosmos DB pricing details. Then it doesn't show that stored procedure is more expensive than a single query. Actually, the cost depends on Rus and RuS depends on the complexity of the query and the amount of concurrency, etc.

You could refer to the RUs document. Also , you could know the RUs charge by the http request header :x-ms-request-charge. Please see this useful trace: How to caculate the Azure Cosmos DB RU used in server side scripting.

Hope it helps you.

Upvotes: 4

Related Questions