Auresol
Auresol

Reputation: 25

Preserve all groups generate by first $group when using the second $group in MongoDB aggregation

I have one collection that the structure of the record look like this,

{
  _id: ObjectId('66a544dc3a8dff9aaaf65343'),
  user_type: "USER",
  status: CREATE
}

The explanation of each field:

I want to run the aggregate to get this output

{
  {
    _id: "ALL",
    stat: {
      "CREATE" : 2,
      "ACTIVE" : 5,
      "DELETE" : 4
    }
    "total": 11
  {
    _id: "USER",
    stat: {
      "CREATE" : 1,
      "ACTIVE" : 2,
      "DELETE" : 0
    },
    "total": 3
  },
  {
    _id: "ADMIN",
    stat: {
      "CREATE" : 0,
      "ACTIVE" : 3,
      "DELETE" : 0
    },
    "total": 3
  },
  {
    _id: "DEV",
    stat: {
      "CREATE" : 1,
      "ACTIVE" : 0,
      "DELETE" : 4
    },
    "total": 5
  }
}

I want to count each occurrence of status for all user_type, compute total number of all status, and create a TOTAL document to count the entire collection

I can easily generate all documents except _id: "ALL" document easily using $group with $project. The aggregate command look like this,

db.getCollection('accounts').aggregate(
  [
    {
      $group: {
        _id: '$user_type',
        CREATE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'CREATE']
              },
              1,
              0
            ]
          }
        },
        ACTIVE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'ACTIVE']
              },
              1,
              0
            ]
          }
        },
        DELETE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'DELETE']
              },
              1,
              0
            ]
          }
        },
      }
    },
    {
      $project: {
        stat: {
          CREATE: '$CREATE',
          ACTIVE: '$ACTIVE',
          DELETE: '$DELETE',
        },
        total: {
          $sum: [
            '$CREATE',
            '$ACTIVE',
            '$DELETE',
          ]
        }
      }
    }
  ],
); 

The problem is the second $group command with _id: null

The problem arise when I want to generate another document. I can put another $group command (using _id: null), but it will delete all others record (the only thing left will be the record which _id: null)

Is there a way to preserve the previous records in pipeline? or is there an alternative solution which is better that this? any comment is welcome.

Thank you very much!

Upvotes: 1

Views: 53

Answers (3)

Shubham
Shubham

Reputation: 1426

If you are using mongodb 6 or greater then following code will work as well

db.getCollection("accounts").aggregate([{
    
    $group:{
        _id:{user_type:"$user_type",status:"$status"},
        count:{$sum:1}
    }
    
},{
    $group:{
        _id:"$_id.user_type",
        total:{$sum:"$count"},
        status:{$push:{k:"$_id.status",v:"$count"}}
    }
},{
    
    $facet:{
        "all":[{
            $group:{
                _id:"ALL",
                total:{$sum:"$total"},
                stat:{
                    $push:"$status"
                }
                
            }
        },{
            $addFields:{
                stat:{
                    $function:{
                        body:`function(stat){var obj = {};for(let s of stat)for(let o of s)obj[o.k]= +(obj[o.k] ? obj[o.k] : 0) + +o?.v ;return obj}`,
                        args:["$stat"],
                        lang:"js"
                    }
                }
            }
        }],
        "others":[{
            $addFields:{
                stat:{$arrayToObject:"$status"}
            }
        }]
    }
    
},{
    $replaceRoot:{
        newRoot:{result:{$concatArrays:["$all","$others"]}}
    }
}])

Pros:- You can add new status without any hard coding. There is no need to do same group repeatedly as you can utilise previous information.

Cons:- Can be used with latest versions of mongodb. Need to allow server side javascript. Pre computed pattern is still the winner.

Upvotes: 0

Add the following pipelines to the aggregation.

  {
    $group: {
      _id: "ALL",

      CREATE: { $sum: "$stat.CREATE" },
      ACTIVE: { $sum: "$stat.ACTIVE" },
      DELETE: { $sum: "$stat.DELETE" },
      total: { $sum: "$total" },
      groups: { $push: "$$ROOT" },
    },
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE",
      },
      total: 1,
      groups: 1,
    },
  },

In the groups field, you can find the document by $user_type.

Upvotes: 0

ray
ray

Reputation: 15217

I can think of 2 ways to do it.

  1. use $facet to perform the grouping sum twice. This might involve more data wrangling at later stages, but this should be the most performant and the canonical way to do what you expect.
db.accounts.aggregate([
  {
    "$facet": {
      "accounts": [
        {
          "$group": {
            "_id": "$user_type",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ],
      "all": [
        {
          "$group": {
            "_id": "ALL",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ]
    }
  },
  {
    "$project": {
      "docs": {
        "$concatArrays": [
          "$accounts",
          "$all"
        ]
      }
    }
  },
  {
    "$unwind": "$docs"
  },
  {
    "$replaceRoot": {
      "newRoot": "$docs"
    }
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE"
      },
      total: {
        $sum: [
          "$CREATE",
          "$ACTIVE",
          "$DELETE"
        ]
      }
    }
  }
])

Mongo Playground


  1. use $$unionWith after your current $group stage to perform the grouping sum again.
db.accounts.aggregate([
  {
    "$group": {
      "_id": "$user_type",
      CREATE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "CREATE"
              ]
            },
            1,
            0
          ]
        }
      },
      ACTIVE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "ACTIVE"
              ]
            },
            1,
            0
          ]
        }
      },
      DELETE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "DELETE"
              ]
            },
            1,
            0
          ]
        }
      }
    }
  },
  {
    "$unionWith": {
      "coll": "accounts",
      "pipeline": [
        {
          "$group": {
            "_id": "ALL",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ]
    }
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE"
      },
      total: {
        $sum: [
          "$CREATE",
          "$ACTIVE",
          "$DELETE"
        ]
      }
    }
  }
])

Mongo Playground


Strictly speaking, both of them are just workarounds and do not actually address the need to reuse previously computed results. Although not likely, they might incur performance issues since they are grouping twice. If the data does not change frequently or you have more tolerance for stale data, you may consider materializing the computed result and perform further grouping on it.

Upvotes: 0

Related Questions