Reputation: 3828
I want to find ALL records where NO item in EITHER array (things1 or things2) has a "color" value. (so ALL color are either '' or null or does not exist)
Here is the data structure -
{
_id: objectId(),
things1: [
{
height:
weight:
color:
}
],
things2: [
{
size:
flavor:
color:
}
]
}
I have tried this:
.find({
$and: [
{
things1:{
$elemMatch: {
$or: [
{
color: ''
},
{
color: null
},
{
color: { $exists: false }
}
]
}
}
},
{
things2:{
$elemMatch: {
$or: [
{
color: ''
},
{
color: null
},
{
color: { $exists: false }
}
]
}
}
}
]
})
This pulls back ANY records where ANY item has a blank, null or non-existent "color"... meaning if there are thee things in "things1" and two of the three have a color and the third is an empty string, and all the items in "things2" have a color - the above query will return this document
Where I want ONLY records where ALL the "color" fields are blank, null or non-existant.
Any help appreciated thanks
Upvotes: 2
Views: 85
Reputation: 3010
We can think of a reverse route i.e. a string that must be either null, not present, or empty is basically a string without characters. We can calculate NOR of ( a filled color present in things1 array, filled color present in things2 array).
For example:
db.collection.find({
$nor:[
{
"things1.color":/.+/
},
{
"things2.color":/.+/
}
]
}).pretty()
Upvotes: 3
Reputation: 151112
I'm going to simplify down your dataset just to make the example a bit easier for readers to comprehend, but the logic still remains the same
{ "a" : [ { "sample" : 1 } ], "b" : [ { "sample" : 1 } ] } // No color
{ "a" : [ { "color" : "b" } ], "b" : [ { "color" : "b" } ] } // YES - Valid color
{ "b" : [ { "color" : "" }, { "color" : "a" } ] } // Empty color
{ "b" : [ { "color" : null }, { "color" : "a" } ] } // Null color
{ "b" : [ { "sample" : 1 } ] } // No color
{ "b" : [ { "color" : "b" } ] } // YES - Valid Color
{ "a" : [ { "color" : "" }, { "color" : "a" } ] } // Empty color
{ "a" : [ { "color" : null }, { "color" : "a" } ] } // Null Color
{ "a" : [ { "sample" : 1 } ] } // No color
{ "a" : [ { "color" : "b" } ] } // YES - Valid color
That basically has all the same combinations you are looking for, as in there are two properties where either or both contain objects in an array where you want ALL items in the array to have a color
property either:
In short, only three of those documents above qualify. So here's how to obtain them:
db.collection.find({
"$or": [
{
"a.color": { "$exists": true },
"a": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
},
{
"b.color": { "$exists": true },
"b": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
}
]
})
Under the $or
we have two paired conditions. Being separately ( very important ) to test for the existence of the named path, and then to look for cases where ANY objects in the array match the conditions which would exclude them ( null or empty ) and the reject those documents using the $not
expression.
The $or
is only used for representing the pairs against each discreet field holding an array.
The results are of course:
{ "a" : [ { "color" : "b" } ], "b" : [ { "color" : "b" } ] }
{ "b" : [ { "color" : "b" } ] }
{ "a" : [ { "color" : "b" } ] }
Being only those documents where either or both of the supplied outer keys contained arrays where ALL elements have a color
property with a value not null
.
In the event you actually do mean that the fields like things1
and things2
can change from document to document in a dynamic way and possibly have a things3
and you want ALL properties with their containing arrays to match conditions, then you basically are out of luck for a standard query and would need to fall back to aggregate()
.
In the example here if we added a document like:
{ "a": [{ "color": "b" }, "c": [{ "color": "" }] }
Then the basic query form shown above would still return this document, so instead you would use aggregate()
:
db.collection.aggregate([
{ "$addFields": {
"comb": {
"$reduce": {
"input": {
"$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$in": [ "$$this.k", [ "a", "b", "c" ] ] }
}
},
"as": "el",
"in": {
"$map": {
"input": "$$el.v",
"in": {
"$mergeObjects": [
{ "type": "$$el.k" },
"$$this"
]
}
}
}
}
},
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}},
{ "$match": {
"comb.color": { "$exists": true },
"comb": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
}},
{ "$addFields": {
"comb": "$$REMOVE"
}}
])
But that's really not desirable. Note in order to dynamically traverse keys you need the $objectToArray
in order to effectively turn all the keys in the document into a single array entry themselves. Then of course $filter
for the expected keys, or alternately just reverse the logic to exclude things like _id
and other values that would not apply.
This then would merge those arrays together whilst re-mapping the name of the key
as a property within the array. Noting mostly that the real main point here is rather than "a.color"
and "b.color"
, we just have a single path now of "comb"
representing the combined mapping.
It would produce the expected result, but the real solution here should be seen in the implemented logic of the pipeline, and that is instead of several document properties with arrays, the better approach is a single array which only used thing1
or thing2
( or in this case "a"
or "b"
or "c"
) to be just another value of a consistently named property within that array.
So this form of storing your data is far more efficient:
{ "data" : [ { "type" : "a", "color" : "" }, { "type" : "a", "color" : "a" }, { "type" : "b", "color" : "" }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "color" : null }, { "type" : "a", "color" : "a" }, { "type" : "b", "color" : null }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "sample" : 1 }, { "type" : "b", "sample" : 1 } ] }
{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "b", "color" : "" }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "b", "color" : null }, { "type" : "b", "color" : "a" } ] }
{ "data" : [ { "type" : "b", "sample" : 1 } ] }
{ "data" : [ { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "" }, { "type" : "a", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "color" : null }, { "type" : "a", "color" : "a" } ] }
{ "data" : [ { "type" : "a", "sample" : 1 } ] }
{ "data" : [ { "type" : "a", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "c", "color" : "" } ] }
If that was your collection structure then the ALL array elements query becomes very simple, and is just basically the very last stage of the aggregate()
which essentially translated your existing structure into this form:
db.collection.find({
"data.color": { "$exists": true },
"data": { "$not": { "$elemMatch": { "color": { "$in": [null, ""] } } } }
})
And then it's still essentially the same three documents that qualify:
{ "data" : [ { "type" : "a", "color" : "b" }, { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "b", "color" : "b" } ] }
{ "data" : [ { "type" : "a", "color" : "b" } ] }
Noting here that a consistent path name like "data.type"
has many advantages that also make that name useful to query options. You can just as cleanly add it to filter conditions if you just want to look for things of "type" where equal to thing1
, and it also makes updating documents a lot easier.
Worth consideration, since basic query forms that do not rely on manipulation from aggregate()
statements perform far better in the long run.
Upvotes: 1