Reputation: 43
I have a document like this:
{
timestamp: ISODate("2013-10-10T23:00:00.000Z"),
values: {
0: 25,
1: 2,
3: 16,
4: 12,
5: 10
}
}
Two questions:
0
from the nested document in values
? Upvotes: 1
Views: 2710
Reputation: 50406
You really need to change the way you are structuring your documents as what you have right now is not good. Nested objects like you have cannot be "traversed" in normal query operations, so their is no way of efficiently searching "across keys".
The only way to do this is using JavaScript evaluation of $where
, and this means "no index" can be used to optimise searching. It is also basically a "brute force" match against every document in the collection:
db.collection.find(function() {
var values = this.values;
return Object.keys(values).some(function(key) {
return values[key] == 2;
});
})
That is just to find a value of "2" in the nested key, in order to find if the "maximum" value was "2" then you would do:
db.collection.find(function() {
var values = this.values;
return Math.max.apply(null, Object.keys(values).map(function(key) {
return values[key];
})) == 2;
})
So brute force is not good. Better to structure your document with "values" as an "array". Then all native queries work fine:
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10]
}
Now you can do:
db.collection.aggregate([
{ "$match": { "values": 2 } },
{ "$unwind": "$values" },
{ "$group": {
"_id": "$_id",
"timestamp": { "$first": "$timestamp" },
"values": { "$push": "$values" },
"maxVal": { "$max": "$values" }
}},
{ "$match": { "maxVal": 2 } }
])
Which might at first glance seem more cumbersome, but the construction of this using native operators as opposed to JavaScript translation does make this much more efficient. It is also notably more efficient in that it is now possible to actually search whether the "values" array actually even contains "2" as a value using an index even, without needing to test all content in looped code.
The main work is done within testing the "max" value for the array, so even if this was not the most shining example, you can see even see the clear difference in how a normal query operation can be combined with JavaScript evaluation now, to make that process faster:
db.collection.find({
"values": 2,
"$where": function() {
return Math.max.apply(null,this.values) == 2;
}
})
So the initial "values": 2
will filter the documents immediately for those that contain "2", and the subsequent expression merely filters down further those documents where the "max" value of the array is "2" as well.
Moreover, if it was your intention to query for "maximum" values like this on a regular basis, then you would be better off storing this value as a discrete field in the document itself, like so:
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10],
"minValue": 2,
"maxValue": 25
}
Then finding documents with the "maximum value" of 2 is as simple as:
db.collection.find({ "maxValue": 2 })
Or the largest "max" within all documents:
db.collection.find().sort({ "maxValue": -1 }).limit(1)
Or even both "min" and "max" from all documents at the same time:
db.collection.aggregate([
{ "$group": {
"_id": null,
"minValue": { "$min": "$minValue" },
"maxValue": { "$max": "$maxValue" }
}}
])
Maintaining this data when adding new "values" is a simple matter of employing the $min
and $max
update operators as you update the document. So to add "26" to the values:
db.collection.update(
{ "timestamp": ISODate("2013-10-10T23:00:00.000Z") },
{
"$push": { "values": 26 },
"$min": { "minValue": 26 },
"$max": { "maxValue": 26 }
}
)
Which results in only ajusting values where either $min
or $max
respectively was less than or greater than the current value.
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10, 26],
"minValue": 2,
"maxValue": 26
}
Therefore it should be clear to see why the structure is important, and that nested objects should be avoided in preference to an array where it is your intention to traverse the data, in either analysing the document itself, or indeed across multiple documents in a collection.
Upvotes: 4