Matt Lightbourn
Matt Lightbourn

Reputation: 597

Mongo aggregate group by multiple values

I have a Mongo query which I want to effectively use the $group in the same way as GROUP BY in SQL.

This isn't working for me unless I set the _id of the new document one of the group categories which doesn't work for me and also, I am not able to get the values I want which come from potentially THREE documents which I am merging together in Mongo.

In SQL, I would write something like to illustrate the grouping and select that I am using as the basis of my aggregation in Mongo:

SELECT entity_id, connection_id, cycle_id, objectOriginAPI,accountBalance
FROM raw_originBusinessData
WHERE objectStatus = 'UPROCESSED'
AND (objectOriginAPI = 'Profit & Loss'
OR objectOriginAPI = 'Balance Sheet'
OR objectOriginAPI = 'Bank Summary')
GROUP BY entity_id, connection_id, cycle_id;

I have paraphrased to simplify what my Mongo script is doing with embedded arrays.

db.getCollection('raw_originBusinessData').aggregate([
 { "$match": {
  objectStatus : "UNPROCESSED"
  , $or: [
    { objectOriginAPI : "Profit & Loss"}
    ,{objectOriginAPI : "Balance Sheet"}
    ,{objectOriginAPI : "Bank Summary"}
    ]}
 },
       // don't worry about this, this is all good
 { "$unwind": "$objectRawOriginData.Reports" }
,{ "$unwind": "$objectRawOriginData.Reports.Rows" }
,{ "$unwind": "$objectRawOriginData.Reports.Rows.Rows" },

       // this is where I believe I'm having my problem
 { "$group": {"_id": "$entity_id"
       //    , "$connection_id"
       //    , "objectCycleID"
, "accountBalances": { "$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value" }
 }},
{$project: {objectClass: {$literal: "Source Data"}
 , objectCategory: {$literal: "Application"}
 , objectType: {$literal: "Account Balances"}
 , objectOrigin: {$literal: "Xero"} 
 , entity_ID: "$_id"
 , connection_ID: "$connection_ID"
 , accountBalances: "$accountBalances"}
}
 ]
      // ,{$out: "std_sourceBusinessData"}
)

So each of the documents I am combining into a single document have the same entity_id, connection_id and cycle_id which I want to put into the new document. I also want to ensure that the new document has it's own unique object_id.

Your help is very much appreciated - Mongo documentation doesn't cover anything about $group other than _id is mandatory but if I don't set the _id to something that I want to group by (in the above script it is set to entity_id) it doesn't group properly.

Upvotes: 1

Views: 938

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

Put simply, the _id needs to be a "composite" value, and therefore comprised of three "sub-keys":

{ "$group":{
    "_id": {
       "entity_id": "$entity_id"
       "connection_id": "$connection_id",
       "objectCycleID": "$objectCycleID"
    },
    "accountBalances": {
        "$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value"
    }
 }},
{ "$project": {
    "_id": 0,
    "objectClass": { "$literal": "Source Data" },
    "objectCategory": { "$literal": "Application"},
    "objectType": { "$literal": "Account Balances"},
    "objectOrigin": { "$literal": "Xero"},
    "entity_ID": "$_id.entity_id",
    "connection_ID": "$_id.connection_id",
    "accountBalances": "$accountBalances"
}}

And then of course, referncing any of those values in the later $project requires you now prefix with $_id as that is now the parent key.

Just as with any MongoDB document, the _id can be anything that is a valid BSON Object in representation. So in this case, the combination means "group on all these field values".

Upvotes: 1

Related Questions