Reputation: 3
I have a mongo collection containing documents that have sub documents that I want to sum the fields of - an example of what I am hoping to achieve is below
The general structure of each document is
{
"pool" : "Foo",
"monthly-figures" : {
"1": {
"a" : 311,
"b" : 1481,
...
"x" : {"a" : 311, "b" : 19.965999999999998},
"y" : {"a" : 200, "b" : 14.174000000000003
}
},
"2": {
"a" : 500,
"b" : 100,
...
"x" : {"a" : 123, "b" : 198},
"y" : {"a" : 200, "b" : 13.7}
},
... // May not all be present
"12": {...}
}
}
}
The reason monthly is stored as an object as opposed to an array is that some months may not be present.
Take for example the three documents
{
"pool" : "Foo",
"monthly-figures" : {
"1": {
"a" : 10,
"b" : 20,
...
"x" : {"a" : 15, "b" :30}
}
},
"2": {
"a" : 500,
"b" : 100,
...
"x" : {"a" : 40, "b" : 50},
},
"7": {
"a": 300,
"b": 90,
...
"x": {"a": 4, "b": 5}
}
}
}
{
"pool" : "Foo",
"monthly-figures" : {
"1": {
"a" : 15,
"b" : 25,
...
"x" : {"a" : 20, "b" : 35},
},
"2": {
"a" : 250,
"b" : 200,
...
"x" : {"a" : 60, "b" : 80},
}
}
}
{
"pool" : "Bar",
"monthly-figures" : {
"1": {
"a" : 300,
"b" : 400,
...
"x" : {"a" : 51, "b" : 3},
}
},
"6": {
"a" : 75,
"b" : 135,
...
"x" : {"a" : 12.5, "b" : 16},
}
}
}
What I want to achieve through aggregation is to group based on the pool
field, then sum the values contained within the monthly-figures
- so the resulting two documents would look like
{
"pool" : "Foo",
"monthly-figures" : {
"1": {
"a" : 25,
"b" : 45,
...
"x" : {"a" : 35, "b" : 65},
},
"2": {
"a" : 750,
"b" : 300,
...
"x" : {"a" : 100, "b" : 130},
},
"7": {
"a": 300,
"b": 90,
...
"x": {"a": 4, "b": 5}
}
}
}
(Document with pool
of Bar would be identical as there is only 1)
It doesn't matter if a month has all 0 values after the aggregation (if say the month doesn't exist on any document of the grouping) but ideally it wouldn't?
I've come up with this query which does work, but I feel is not the best way to do it - lots of repetition - how can I improve?
{$group: {
// Group to pool
_id: "$pool",
// Sum grouped documents
"1a": {$sum: "$monthly-figures.1.a"},
"1b": {$sum: "$monthly-figures.1.b"},
...
"1xa": {$sum: "$monthly-figures.1.x.a"},
"1xb": {$sum: "$monthly-figures.1.x.b"},
"2a": {$sum: "$monthly-figures.2.a"},
... Continue all the way down to 12
}
},
{$project: {
"_id": 0,
"pool": "$_id",
"monthly-figures": {
"1": {
"a": "$1a",
"b": "$1b",
...
"x": {
"a": "$1xa",
"b": "$1xb"
}
},
"2": {
"a": "$2a",
...
}
... Continue all the way down to 12
}
}
}
Any ideas on a cleaner pipeline? Cheers!
Upvotes: 0
Views: 735
Reputation: 1402
One way to eliminate the need to list each month is to turn the monthly-figures object into an array. You said:
The reason monthly is stored as an object as opposed to an array is that some months may not be present.
But an array would still work because you could have:
[{month: 1, figures: {...}}, {month: 6, figures: {...}}]
With monthly-figures an array, we can $unwind
each element of the array into its own document. Now it's possible to do a $group
on both the pool and month to get the sums. To gather up the months for each pool we can do another $group
on just the pool and $push
specially formed objects, containing the month and figures, onto an array called monthly-figures. Those objects are special because the k and v keys are recognized by the $arrayToObject
operator that's used in the next stage to get back the original form. Here's the query:
db.colx.aggregate([{
"$project": {
"pool": 1,
"monthly-figures": {"$objectToArray": "$monthly-figures"}
}
}, {
"$unwind": "$monthly-figures"
}, {
"$group": {
"_id": {
"pool": "$pool",
"month": "$monthly-figures.k",
},
"a": {"$sum": "$monthly-figures.v.a"},
"b": {"$sum": "$monthly-figures.v.b"},
"x_a": {"$sum": "$monthly-figures.v.x.a"},
"x_b": {"$sum": "$monthly-figures.v.x.b"}
}
}, {
"$group": {
"_id": "$_id.pool",
"monthly-figures": {
"$push": {
"k": "$_id.month",
"v": {
"a": "$a",
"b": "$b",
"x": {
"a": "$x_a",
"b": "$x_b"
}
}
}
}
}
}, {
"$project": {
"_id": 0,
"pool": "$_id",
"monthly-figures": {"$arrayToObject": "$monthly-figures"}
}
}])
Here's the output from the query:
{
"pool" : "Bar",
"monthly-figures" : {
"6" : {
"a" : 75,
"b" : 135,
"x" : {
"a" : 12.5,
"b" : 16
}
},
"1" : {
"a" : 300,
"b" : 400,
"x" : {
"a" : 51,
"b" : 3
}
}
}
}
{
"pool" : "Foo",
"monthly-figures" : {
"1" : {
"a" : 25,
"b" : 45,
"x" : {
"a" : 35,
"b" : 65
}
},
"2" : {
"a" : 750,
"b" : 300,
"x" : {
"a" : 100,
"b" : 130
}
},
"7" : {
"a" : 300,
"b" : 90,
"x" : {
"a" : 4,
"b" : 5
}
}
}
}
Links to MongoDB Documentation:
Upvotes: 1