Reputation: 6144
I can reference the values of individual values of attributes in MongoDB aggregation pipeline using '$' operator. But, how do I access (reference) the whole document ?
UPDATE: An example provided to explain scenario.
Here's an example of what I'm trying to do. I have a collection of tweets. And every tweet has a member 'clusters', which is an indication of to what cluster a particular tweet belongs to.
{
"_id" : "5803519429097792069",
"text" : "The following vehicles/owners have been prosecuted by issuing notice on the basis of photographs on dated... http://t.co/iic1Nn85W5",
"oldestts" : "2013-02-28 16:11:32.0",
"firstTweetTime" : "4 hours ",
"id" : "307161122191065089",
"isLoc" : true,
"powertweet" : true,
"city" : "new+delhi",
"latestts" : "2013-02-28 16:35:05.0",
"no" : 0,
"ts" : 1362081807.9693,
"clusters" : [
{
"participationCoeff" : 1,
"clusterID" : "5803519429097792069"
}
],
"username" : "dtptraffic",
"verbSet" : [
"date",
"follow",
"prosecute",
"have",
"be"
],
"timestamp" : "4 hours ",
"entitySet" : [ ],
"subCats" : {
"Generic" : [ ]
},
"lang" : "en",
"fns" : 18.35967,
"url" : "url|109|131|http://fb.me/2CeaI7Vtr",
"cat" : [
"Generic"
],
"order" : 7
}
Since, there are some couple of hundred thousands tweets in my collection, I want to group all tweets by 'clusters.clusterID'. Basically, I would want to write a query like following:
db.tweets.aggregate (
{ $group : { _id : '$clusters.clusterID', 'members' : {$addToSet : <????> } } }
)
I want to access the presently processing document and reference it where I have put in the above query. Does anyone knows how to do this?
Upvotes: 17
Views: 8456
Reputation: 1108
Use the $$ROOT
variable:
References the root document, i.e. the top-level document, currently being processed in the aggregation pipeline stage.
Upvotes: 27
Reputation: 405
I think MapReduce more useful for this task.
As written in the comments by Asya Kamsky, my example is incorrect for mongodb, please use official docs for mongoDB.
Upvotes: -2
Reputation: 42352
There is currently no mechanism to access the full document in aggregation framework, if you only needed a subset of fields, you could do:
db.tweets.aggregate([ {$group: { _id: '$clusters.clusterID',
members: {$addToSet :
{ user: "$user",
text: "$text", // etc for subset
// of fields you want
}
}
}
} ] )
Don't forget with a few hundred thousand tweets, aggregating the full document will run you into the 16MB limit for returned aggregation framework result document.
You can do this via MapReduce like this:
var m = function() {
emit(this.clusters.clustersID, {members:[this]});
}
var r = function(k,v) {
res = {members: [ ] };
v.forEach( function (val) {
res.members = val.members.concat(res.members);
} );
return res;
}
db.tweets.mapReduce(m, r, {out:"output"});
Upvotes: 2