manojpt
manojpt

Reputation: 1269

get the fields in the input document after $group in aggregation pipeline in MongoDB

The problem I am facing is that how to access the original document after one group operation and also carry the fields after $group in aggregation pipeline in MongoDB.

For example: [ group, unwind, group]

Original document is:

{
"_id" : ObjectId("361de42f1938e89b179dda42"),
"user_id" : ObjectId("9424021bafbde55512e39b83"),
"candidate_id" : ObjectId("54f65356294160421ead3ca1")
"OVERALL_SCORE" : 150,
"SCORES" : [ 
    { "NAME" : "asd", "OBTAINED_SCORE" : 28}, { "NAME" : "acd", "OBTAINED_SCORE" : 36 }, { "NAME" : "abc", "OBTAINED_SCORE" : 40}
 ]
}

Aggregation function:

 db.coll.aggregate([ $group : { _id : { user_id : "$user_id"}, BEST_SCORE : { $max : "$OVERALL_SCORE"}, AVG_SCORE : { $avg : "$OVERALL_SCORE" }}} ])

Below is the sample output (after 1st group):

{
"result" : [ 
    {
        "_id" : {
            "user_id" : ObjectId("9424021bafbde55512e39b83")
        },
        "BEST_SCORE" : 150,
        "AVG_SCORE" : 132
    }
],
"ok" : 1
 }

Question is: (I don't know whether it can be achievable) I want the fields in the original documents (input to aggregation).

For ex: 1) unwind "SCORES" which is in the original document and next group by "candidate_id", "user_id".

2) I want "BEST_SCORE", "AVG_SCORE" (after 1st group) fields to access in the 2nd group also.

Aggregation function should look something like this:

   db.coll.aggregate([ $group : { _id : { user_id : "$user_id"}, BEST_SCORE : { $max : "$OVERALL_SCORE"}, AVG_SCORE : { $avg : "$OVERALL_SCORE" }}}, { $unwind : "$SCORES"}, /*problem is--after group operation "SCORES" field which is in original document not available */ { $group : _id : { NAME: "$SCORES.NAME"}, AVG_OBTAINED_SCORE: { $avg : "$SCORES.OBTAINED_SCORE"}} **/*problem is--this is also in the original document*/** ])

Output should look like this:

   "BEST_SCORE": 150,                     //after 1st group
  "AVG_SCORE": 132,                       //after 1st group
  "SCORES": [                             //problem --- unwind "SCORES" and then group which is actually will not be available after 1st group (get this from original document)
    {
      "NAME": "abc",
      "AVG_OBTAINED_SCORE": 25.5
    },
    {
      "NAME": "asd",
      "AVG_OBTAINED_SCORE": 24
    },
    {
      "NAME": "acd",
      "AVG_OBTAINED_SCORE": 32
    }
  ]

Could any one please help me out.

Thank you

Upvotes: 2

Views: 4200

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151132

When grouping with something that you want to retain the values for all considered documents in a group you need to use $push. Catch is, that this is an array. So you process $unwind twice, and also have two $group stages:

db.coll.aggregate([
    {  "$group" : { 
        "_id": "$user_id", 
        "BEST_SCORE": { "$max": "$OVERALL_SCORE" },
        "AVG_SCORE": { "$avg": "$OVERALL_SCORE" },
        "SCORES": { "$push": "SCORES" }
    }}, 

    // SCORES in an array of arrays. Unwind twice
    { "$unwind": "$SCORES" },
    { "$unwind": "$SCORES" },

    // Group for averages on elements
    { "$group": {
        "_id": {
            "user_id": "$_id",
            "NAME": "$SCORES.name"
        },
        "BEST_SCORE": { "$first": "$BEST_SCORE" },
        "AVG_SCORE": { "$first": "$AVG_SCORE" }
        "AVG_OBTAINED_SCORE": { "$avg": "$SCORES.OBTAINED_SCORE" } 
    }},

    // Group to user_id
    { "$group": {
        "user_id": "$_id.user_id",
        "BEST_SCORE": { "$first": "$BEST_SCORE" },
        "AVG_SCORE": { "$first": "$AVG_SCORE" }
        "SCORES": { "$push": {
            "NAME": "$_id.NAME",
            "AVG_OBTAINED_SCORE": "$AVG_OBTAINED_SCORE"
        }}     
    }}
])

You might be tempted to consider using $unwind before the first $group, but if you did then the calculated averages would be affected by the number of elements present in the array that was being "un-wound". So the "double $unwind" is a necessary process here.

Upvotes: 4

Related Questions