Mike
Mike

Reputation: 1738

DocumentDB filter an array by an array

I have a document that looks essentially like this:

{
     "Name": "John Smith",
     "Value": "SomethingIneed",
     "Tags: ["Tag1" ,"Tag2", "Tag3"]
 }

My goal is to write a query where I find all documents in my database whose Tag property contains all of the tags in a filter.

For example, in the case above, my query might be ["Tag1", "Tag3"]. I want all documents whose tags collection contains Tag1 AND Tag3.

I have done the following:

  1. tried an All Contains type linq query

        var tags = new List<string>() {"Test", "TestAccount"};
    
        var req =
            Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"))
            .Where(x => x.Tags.All(y => tags.Contains(y)))
            .ToList();
    
  2. Created a user defined function (I couldn't get this to work at all)

    var tagString = "'Test', 'TestAccount'";

        var req =
            Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"),
                $"Select c.Name, c.Email, c.id from c WHERE udf.containsAll([${tagString}] , c.Tags)").ToList();
    

with containsAll defined as:

function arrayContainsAnotherArray(needle, haystack){
   for(var i = 0; i < needle.length; i++){
     if(haystack.indexOf(needle[i]) === -1)
        return false;
   }
   return true;
 }
  1. Use System.Linq.Dynamic to create a predicate from a string

     var query = new StringBuilder("ItemType = \"MyType\"");
    
        if (search.CollectionValues.Any())
        {
            foreach (var searchCollectionValue in search.CollectionValues)
            {
                query.Append($" and Collection.Contains(\"{searchCollectionValue}\")");
            }
        }
    

3 actually worked for me, but the query was very expensive (more than 2000 RUs on a collection of 10K documents) and I am getting throttled like crazy. My result set for the first iteration of my application must be able to support 10K results in the result set. How can I best query for a large number of results with an array of filters?

Thanks.

Upvotes: 0

Views: 751

Answers (1)

Larry Maccherone
Larry Maccherone

Reputation: 9523

The UDF could be made to work but it would be a full table scan and so not recommended unless combined with other highly-selective criteria.

I believe the most performant (index-using) approach would be to split it into a series of AND statements. You could do this programmatically building up your query string (being careful to fully escape and user-provided data for security reasons). So, the resulting query would look like:

SELECT * 
FROM c 
WHERE 
    ARRAY_CONTAINS(c.Tags, "Tag1") AND 
    ARRAY_CONTAINS(c.Tags, "Tag3")

Upvotes: 1

Related Questions