Edgar Moreno
Edgar Moreno

Reputation: 23

MongoDB aggregate group by project and then sum of weeks hours

I have this entry structure:

{
    "_id" : ObjectId("56de0178cf7970ac2a86fb23"),
    "createdAt" : ISODate("2016-03-07T16:32:24.681-06:00"),
    "updatedAt" : ISODate("2016-03-07T16:32:24.681-06:00"),
    "yearTask" : 2016,
    "startWeek" : 10,
    "task" : "31231321",
    "hours" : 312,
    "project" : [ 
        {
            "Project" : "1000G",
            "_id" : "565f193cea6493ce0acc9730"
        }
    ],
    "plannedWeeks" : [ 
        {
            "yearTask" : 2016,
            "hours" : 3,
            "weekNumber" : 10
        }, 
        {
            "yearTask" : 2016,
            "hours" : 3,
            "weekNumber" : 11
        }, 
        {
            "yearTask" : 2016,
            "hours" : 3,
            "weekNumber" : 12
        }, 
        {
            "yearTask" : 2016,
            "hours" : 3,
            "weekNumber" : 13
        }, 
        {
            "yearTask" : 2016,
            "hours" : 3,
            "weekNumber" : 14
        }
    ],
}

So imagine that I have other entries and I need the total sum of hours for each week (weekNumber) and also I need to have this information group by project (in this case the name of the project is "Project"). The number of weeks are variable. The project field is an array, but only contains one project..

The output would look like this :

{
   _id : {
           "name" : "1000G",
            "yearTask" : 2016,
            "weeks" : [ 
                    {
                        "yearTask" : 2016,
                        "hours" : 34, <--Total sum for this project and week
                        "weekNumber" : 10
                    }

                 .... etc.

             ]

        },
   _id : {
           "name" : "Project2",
            "yearTask" : 2016,
            "weeks" : [ 
                    {
                        "yearTask" : 2016,
                        "hours" : 584,<--Total sum for this project and week
                        "weekNumber" : 10
                    }

                 .... etc.

             ]

        } 

}

My current query only groups the planned weeks by project:

db.tasks.aggregate(
   [
        { "$unwind": "$project" },
        {$group : {
           _id : { 
               name : "$project.Project", 
               yearTask : "$yearTask",  
               weeks : "$plannedWeeks",

            },
            /*"matches" : { "$sum" : "$plannedWeeks.hours" },*/
        }},
        { $match : { "_id.yearTask": { $eq: 2016 } } },

   ]
)

I tried to use { "$unwind": "$plannedWeeks" }, but I don't know how to sum the total of every week and then group them by project

Edited - My solution was :

   [
    { "$match" : { "yearTask": 2016 } },
    { "$unwind": "$project" },
    { "$unwind": "$plannedWeeks" },
    /*{ "$match" : { "yearTask": 2016 } },*/
    {
        "$group": {
            "_id": {
                "name": "$project.Project",
                /*"yearTask": "$plannedWeeks.yearTask",*/
                "weekYear": "$plannedWeeks.yearTask",
                "weekNumber": "$plannedWeeks.weekNumber"
            },
            "weeks": {
                "$push": {
                    "yearTask": "$plannedWeeks.yearTask",                   
                    "weekNumber": "$plannedWeeks.weekNumber"
                }
            },
            "hours": { "$sum": "$plannedWeeks.hours" },            
        }
    },
    { $sort : { "_id.weekYear" : 1,"_id.weekNumber" : 1, } },
    { "$group": {
        "_id": {
            "name": "$_id.name",
            /*"yearTask": "$_id.yearTask",*/
        },
        "weeks": {
            "$push": {
                 "yearTask": "$_id.weekYear",
                 "hours": "$hours",
                 "weekNumber": "$_id.weekNumber"
            }
        }
    }},


] 

Upvotes: 2

Views: 2651

Answers (2)

Blakes Seven
Blakes Seven

Reputation: 50416

You want "two" $group stages to first total up by "week" and then $push the results into the rolled-up key for each stage.

Ideally with $arrayElemAt from MongoDB 3.2:

db.tasks.aggregate([
    { "$unwind": "$plannedWeeks" },
    { "$group": {
        "_id": {
            "name": { "$arrayElemAt": [ "$project.Project", 0 ] },
            "yearTask": "$yearTask",
            "weekNumber": "$plannedWeeks.weekNumber"
        },
        "hours": { "$sum": "$plannedWeeks.hours" }
    }},
    { "$group": {
        "_id": {
            "name": "$_id.name",
            "yearTask": "$_id.yearTask",
        },
        "weeks": {
            "$push": {
                 "yearTask": "$_id.yearTask",
                 "hours": "$hours",
                 "weekNumber": "$_id.weekNumber"
            }
        }
    }}
])

And of course since "project" is an array of only one item, then there is no problem with using $unwind there as well in earlier versions

db.tasks.aggregate([
    { "$unwind": "$plannedWeeks" },
    { "$unwind": "$project" },
    { "$group": {
        "_id": {
            "name": "$project.Project",
            "yearTask": "$yearTask",
            "weekNumber": "$plannedWeeks.weekNumber"
        },
        "hours": { "$sum": "$plannedWeeks.hours" }
    }},
    { "$group": {
        "_id": {
            "name": "$_id.name",
            "yearTask": "$_id.yearTask",
        },
        "weeks": {
            "$push": {
                 "yearTask": "$_id.yearTask",
                 "hours": "$hours",
                 "weekNumber": "$_id.weekNumber"
            }
        }
    }}
])

At any rate, it's two $group stages where the first does the sum and the next creates the array.

It's probably a good idea to reconsider the usage of an array for "project" if it's only ever going to contain one element. Multiple arrays in documents can cause problems if you expect some sort of correlation between the data contained, and that is generally better expressed in a single array instead, or as just a base property, even nested.

As always, $match first in aggregation pipelines if you actually intend to filter document content by conditions in results.

Upvotes: 1

chridam
chridam

Reputation: 103425

Consider running the following aggregation pipeline to get the correct result

pipeline = [
    { "$match" : { "plannedWeeks.yearTask": 2016 } },
    { "$unwind": "$project" },
    { "$unwind": "$plannedWeeks" },
    { "$match" : { "plannedWeeks.yearTask": 2016 } },
    {
        "$group": {
            "_id": {
                "name": "$project.Project",
                "yearTask": "$plannedWeeks.yearTask",
                "weekNumber": "$plannedWeeks.weekNumber"
            },
            "weeks": {
                "$push": {
                    "yearTask": "$plannedWeeks.yearTask",                   
                    "weekNumber": "$plannedWeeks.weekNumber"
                }
            },
            "totalHours": { "$sum": "$plannedWeeks.hours" },            
        }
    }
]
db.tasks.aggregate(pipeline)

Upvotes: 0

Related Questions