Reputation: 597
I have a Mongo query which I want to effectively use the $group in the same way as GROUP BY in SQL.
This isn't working for me unless I set the _id of the new document one of the group categories which doesn't work for me and also, I am not able to get the values I want which come from potentially THREE documents which I am merging together in Mongo.
In SQL, I would write something like to illustrate the grouping and select that I am using as the basis of my aggregation in Mongo:
SELECT entity_id, connection_id, cycle_id, objectOriginAPI,accountBalance
FROM raw_originBusinessData
WHERE objectStatus = 'UPROCESSED'
AND (objectOriginAPI = 'Profit & Loss'
OR objectOriginAPI = 'Balance Sheet'
OR objectOriginAPI = 'Bank Summary')
GROUP BY entity_id, connection_id, cycle_id;
I have paraphrased to simplify what my Mongo script is doing with embedded arrays.
db.getCollection('raw_originBusinessData').aggregate([
{ "$match": {
objectStatus : "UNPROCESSED"
, $or: [
{ objectOriginAPI : "Profit & Loss"}
,{objectOriginAPI : "Balance Sheet"}
,{objectOriginAPI : "Bank Summary"}
]}
},
// don't worry about this, this is all good
{ "$unwind": "$objectRawOriginData.Reports" }
,{ "$unwind": "$objectRawOriginData.Reports.Rows" }
,{ "$unwind": "$objectRawOriginData.Reports.Rows.Rows" },
// this is where I believe I'm having my problem
{ "$group": {"_id": "$entity_id"
// , "$connection_id"
// , "objectCycleID"
, "accountBalances": { "$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value" }
}},
{$project: {objectClass: {$literal: "Source Data"}
, objectCategory: {$literal: "Application"}
, objectType: {$literal: "Account Balances"}
, objectOrigin: {$literal: "Xero"}
, entity_ID: "$_id"
, connection_ID: "$connection_ID"
, accountBalances: "$accountBalances"}
}
]
// ,{$out: "std_sourceBusinessData"}
)
So each of the documents I am combining into a single document have the same entity_id, connection_id and cycle_id which I want to put into the new document. I also want to ensure that the new document has it's own unique object_id.
Your help is very much appreciated - Mongo documentation doesn't cover anything about $group other than _id is mandatory but if I don't set the _id to something that I want to group by (in the above script it is set to entity_id) it doesn't group properly.
Upvotes: 1
Views: 938
Reputation: 50406
Put simply, the _id
needs to be a "composite" value, and therefore comprised of three "sub-keys":
{ "$group":{
"_id": {
"entity_id": "$entity_id"
"connection_id": "$connection_id",
"objectCycleID": "$objectCycleID"
},
"accountBalances": {
"$push": "$objectRawOriginData.Reports.Rows.Rows.Cells.Value"
}
}},
{ "$project": {
"_id": 0,
"objectClass": { "$literal": "Source Data" },
"objectCategory": { "$literal": "Application"},
"objectType": { "$literal": "Account Balances"},
"objectOrigin": { "$literal": "Xero"},
"entity_ID": "$_id.entity_id",
"connection_ID": "$_id.connection_id",
"accountBalances": "$accountBalances"
}}
And then of course, referncing any of those values in the later $project
requires you now prefix with $_id
as that is now the parent key.
Just as with any MongoDB document, the _id
can be anything that is a valid BSON Object in representation. So in this case, the combination means "group on all these field values".
Upvotes: 1