VaidAbhishek
VaidAbhishek

Reputation: 6144

Referencing the whole document in MongoDB Aggregation Pipeline

I can reference the values of individual values of attributes in MongoDB aggregation pipeline using '$' operator. But, how do I access (reference) the whole document ?


UPDATE: An example provided to explain scenario.

Here's an example of what I'm trying to do. I have a collection of tweets. And every tweet has a member 'clusters', which is an indication of to what cluster a particular tweet belongs to.

{
    "_id" : "5803519429097792069",
    "text" : "The following vehicles/owners have been prosecuted by issuing notice on the basis of photographs on dated... http://t.co/iic1Nn85W5",
    "oldestts" : "2013-02-28 16:11:32.0",
    "firstTweetTime" : "4 hours ",
    "id" : "307161122191065089",
    "isLoc" : true,
    "powertweet" : true,
    "city" : "new+delhi",
    "latestts" : "2013-02-28 16:35:05.0",
    "no" : 0,
    "ts" : 1362081807.9693,
    "clusters" : [
        {
            "participationCoeff" : 1,
            "clusterID" : "5803519429097792069"
        }
    ],
    "username" : "dtptraffic",
    "verbSet" : [
        "date",
        "follow",
        "prosecute",
        "have",
        "be"
    ],
    "timestamp" : "4 hours ",
    "entitySet" : [ ],
    "subCats" : {
        "Generic" : [ ]
    },
    "lang" : "en",
    "fns" : 18.35967,
    "url" : "url|109|131|http://fb.me/2CeaI7Vtr",
    "cat" : [
        "Generic"
    ],
    "order" : 7
} 

Since, there are some couple of hundred thousands tweets in my collection, I want to group all tweets by 'clusters.clusterID'. Basically, I would want to write a query like following:

db.tweets.aggregate (
{ $group : { _id : '$clusters.clusterID', 'members' : {$addToSet : <????> } } }
)

I want to access the presently processing document and reference it where I have put in the above query. Does anyone knows how to do this?

Upvotes: 17

Views: 8456

Answers (3)

Volox
Volox

Reputation: 1108

Use the $$ROOT variable:

References the root document, i.e. the top-level document, currently being processed in the aggregation pipeline stage.

Upvotes: 27

Silver_Clash
Silver_Clash

Reputation: 405

I think MapReduce more useful for this task.

As written in the comments by Asya Kamsky, my example is incorrect for mongodb, please use official docs for mongoDB.

Upvotes: -2

Asya Kamsky
Asya Kamsky

Reputation: 42352

There is currently no mechanism to access the full document in aggregation framework, if you only needed a subset of fields, you could do:

db.tweets.aggregate([ {$group: { _id: '$clusters.clusterID',
                                  members: {$addToSet :  
                                       { user: "$user",
                                         text: "$text", // etc for subset 
                                                        // of fields you want
                                       }
                                  } 
                               } 
                       } ] )

Don't forget with a few hundred thousand tweets, aggregating the full document will run you into the 16MB limit for returned aggregation framework result document.

You can do this via MapReduce like this:

var m = function() {
  emit(this.clusters.clustersID, {members:[this]});
}

var r = function(k,v) {
  res = {members: [ ] };
  v.forEach( function (val) {
     res.members = val.members.concat(res.members);
  } );
  return res;
}

db.tweets.mapReduce(m, r, {out:"output"});

Upvotes: 2

Related Questions