Reputation: 3828

Find ALL records where NO item in arrays has a value

I want to find ALL records where NO item in EITHER array (things1 or things2) has a "color" value. (so ALL color are either '' or null or does not exist)

Here is the data structure -

{
    _id: objectId(),
    things1: [
        {
            height:
            weight:
            color:
        }
    ],
    things2: [
        {
            size:
            flavor:
            color:
        }
    ]
}

I have tried this:

.find({
    $and: [
        {
            things1:{ 
                $elemMatch: { 
                    $or: [
                        {
                            color: ''
                        },
                        {
                            color: null
                        },
                        {
                            color: { $exists: false }
                        }
                    ]
                } 
            } 
        },
        {
            things2:{ 
                $elemMatch: { 
                    $or: [
                        {
                            color: ''
                        },
                        {
                            color: null
                        },
                        {
                            color: { $exists: false }
                        }
                    ]
                } 
            } 
        }
    ]
})

This pulls back ANY records where ANY item has a blank, null or non-existent "color"... meaning if there are thee things in "things1" and two of the three have a color and the third is an empty string, and all the items in "things2" have a color - the above query will return this document

Where I want ONLY records where ALL the "color" fields are blank, null or non-existant.

Any help appreciated thanks

Upvotes: 2

Answers (2)

Himanshu Sharma

Reputation: 3010

We can think of a reverse route i.e. a string that must be either null, not present, or empty is basically a string without characters. We can calculate NOR of ( a filled color present in things1 array, filled color present in things2 array).

For example:

db.collection.find({
    $nor:[
        {
            "things1.color":/.+/
        },
        {
            "things2.color":/.+/
        }
    ]
}).pretty()

Upvotes: 3

Neil Lunn

Reputation: 151112

I'm going to simplify down your dataset just to make the example a bit easier for readers to comprehend, but the logic still remains the same

{ "a" : [ { "sample" : 1 } ], "b" : [ { "sample" : 1 } ] }       // No color
{ "a" : [ { "color" : "b" } ], "b" : [ { "color" : "b" } ] }     // YES - Valid color
{ "b" : [ { "color" : "" }, { "color" : "a" } ] }                // Empty color 
{ "b" : [ { "color" : null }, { "color" : "a" } ] }              // Null color
{ "b" : [ { "sample" : 1 } ] }                                   // No color
{ "b" : [ { "color" : "b" } ] }                                  // YES - Valid Color
{ "a" : [ { "color" : "" }, { "color" : "a" } ] }                // Empty color
{ "a" : [ { "color" : null }, { "color" : "a" } ] }              // Null Color
{ "a" : [ { "sample" : 1 } ] }                                   // No color
{ "a" : [ { "color" : "b" } ] }                                  // YES - Valid color

That basically has all the same combinations you are looking for, as in there are two properties where either or both contain objects in an array where you want ALL items in the array to have a color property either:

existing
Not null
Not empty

In short, only three of those documents above qualify. So here's how to obtain them:

db.collection.find({
  "$or": [
    {
      "a.color": { "$exists": true },
      "a": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
    },
    {
      "b.color": { "$exists": true },
      "b": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
    }
  ]
})

Under the $or we have two paired conditions. Being separately ( very important ) to test for the existence of the named path, and then to look for cases where ANY objects in the array match the conditions which would exclude them ( null or empty ) and the reject those documents using the $not expression.

The $or is only used for representing the pairs against each discreet field holding an array.

The results are of course:

{ "a" : [ { "color" : "b" } ], "b" : [ { "color" : "b" } ] }
{ "b" : [ { "color" : "b" } ] }
{ "a" : [ { "color" : "b" } ] }

Being only those documents where either or both of the supplied outer keys contained arrays where ALL elements have a color property with a value not null.

In the event you actually do mean that the fields like things1 and things2 can change from document to document in a dynamic way and possibly have a things3 and you want ALL properties with their containing arrays to match conditions, then you basically are out of luck for a standard query and would need to fall back to aggregate().

In the example here if we added a document like:

{ "a": [{ "color": "b" }, "c": [{ "color": "" }] }

Then the basic query form shown above would still return this document, so instead you would use aggregate():

db.collection.aggregate([
  { "$addFields": {
    "comb": {
      "$reduce": {
        "input": {
          "$map": {
            "input": {
              "$filter": {
                "input": { "$objectToArray": "$$ROOT" },
                "cond": { "$in": [ "$$this.k", [ "a", "b", "c" ] ] }
              }
            },
            "as": "el",
            "in": {
              "$map": {
                "input": "$$el.v",
                "in": {
                  "$mergeObjects": [
                    { "type": "$$el.k" },
                    "$$this"
                  ]
                }
              }
            }
          }
        },
        "initialValue": [],
        "in": { "$concatArrays": [ "$$value", "$$this" ] }
      }
    }
  }},
  { "$match": {
    "comb.color": { "$exists": true },
    "comb": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
  }},
  { "$addFields": {
    "comb": "$$REMOVE"
  }} 
])

But that's really not desirable. Note in order to dynamically traverse keys you need the $objectToArray in order to effectively turn all the keys in the document into a single array entry themselves. Then of course $filter for the expected keys, or alternately just reverse the logic to exclude things like _id and other values that would not apply.

This then would merge those arrays together whilst re-mapping the name of the key as a property within the array. Noting mostly that the real main point here is rather than "a.color" and "b.color", we just have a single path now of "comb" representing the combined mapping.

It would produce the expected result, but the real solution here should be seen in the implemented logic of the pipeline, and that is instead of several document properties with arrays, the better approach is a single array which only used thing1 or thing2 ( or in this case "a" or "b" or "c" ) to be just another value of a consistently named property within that array.

So this form of storing your data is far more efficient:

{ "data" : [ { "type" : "a", "color" : "" }, { "type" : "a", "color" : "a" }, { "type" : "b", "color" : "" }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "color" : null }, { "type" : "a", "color" : "a" }, { "type" : "b", "color" : null }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "sample" : 1 }, { "type" : "b", "sample" : 1 } ] }
{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "b", "color" : "" }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "b", "color" : null }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "b", "sample" : 1 } ] }
{ "data" : [ { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "" }, { "type" : "a", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "color" : null }, { "type" : "a", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "sample" : 1 } ] }
{ "data" : [ { "type" : "a", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "c", "color" : "" } ] }

If that was your collection structure then the ALL array elements query becomes very simple, and is just basically the very last stage of the aggregate() which essentially translated your existing structure into this form:

db.collection.find({
  "data.color": { "$exists": true },
  "data": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
})

And then it's still essentially the same three documents that qualify:

{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "b" } ] }

Noting here that a consistent path name like "data.type" has many advantages that also make that name useful to query options. You can just as cleanly add it to filter conditions if you just want to look for things of "type" where equal to thing1, and it also makes updating documents a lot easier.

Worth consideration, since basic query forms that do not rely on manipulation from aggregate() statements perform far better in the long run.

Upvotes: 1

Find ALL records where NO item in arrays has a value

Answers (2)

Related Questions