nimeshkiranverma
nimeshkiranverma

Reputation: 1428

Efficient way to aggregate in Mongodb

I have a collection

{
"name" : "foo"
"clicked" : {"0":6723,"1": 1415,"2":1122}
}
{
    "name" : "bar"
    "clicked" : {"8":1423,"9": 1415,"10":1122}
    }
{
"name" : "xyz"
"clicked" : {"22":6723,"23": 1415,"2":1234}
}

clicked is basically {"position of item-clicked in the list" : "id of the item"}

The final output which I want is the total number of times an item has been clicked i.e the following for the above sample:

    {
     6723:2, 
     1415:3, 
     1423:1,
     1122:2,
     1234:1
    }

One way to do this by maintaining an in memory dict (in python script) and looking up "clicked" field in each document to update the dict. I'm new to mongo please help!

Upvotes: 2

Views: 489

Answers (3)

nimeshkiranverma
nimeshkiranverma

Reputation: 1428

I was finally able to build a map-reduce aggregation to do get my job done, without changing the schema.

var map_function = function(){ 
                      for( x in this.clicked){
                          var key = this.clicked[x]; 
                          emit(key,1);
                          } 
                      };

var reduce_function = function(a,b){
                          return Array.sum(b);
                      };
db.imp.mapReduce( map_function, reduce_function,"id").find()

Upvotes: 1

chridam
chridam

Reputation: 103365

If you could do away with the present schema and redesign it in such a way that clicked is an array that has key-value pair as its elements, then you could apply the aggregation framework to achieve the desired result.

In Mongo, you can convert the schema by iterating over the documents using the forEach() method of the find() cursor and updating the clicked field with an array of key-value pair objects:

db.collection.find().forEach(function (doc){
    var obj     = {},
        keys    = Object.keys(doc.clicked), 
            clicked = keys.map(function (key){ 
                obj.position = parseInt(key);
                obj.elementId = doc.clicked[key]
                return obj;
            }); 
    doc.clicked = clicked;
    db.collection.save(doc);
});

After changing the schema using the above, your documents would have the following structure:

{
    "name": "foo",
    "clicked": [
        { "position": 0, "elementId": 6723 },
        { "position": 1, "elementId": 1415 },
        { "position": 2, "elementId": 1122 }
    ]
},
{
    "name": "bar",
    "clicked": [
        { "position": 8, "elementId": 1423 },
        { "position": 9, "elementId": 1415 },
        { "position": 10, "elementId": 1122 }
    ]    
},
{
    "name": "xyz"
    "clicked": [
        { "position": 22, "elementId": 6723 },
        { "position": 23, "elementId": 1415 },
        { "position": 2,  "elementId": 1234 }
    ]
}

It would be quite an easy feat to get the desired aggregation by using the aggregation framework. This would entail an aggregation pipeline that consists of an $unwind and $group operators, with the $unwind as its first pipeline step. This deconstructs the clicked array field from the input documents to output a document for each element. Each output document replaces the array with an element value.

The $group operator groups the input documents by the specified elementId identifier/key and applies the accumulator expression $sum to each group that would give the count of the grouped documents:

var pipeline = [
      {
        "$unwind": "$clicked"
      },
      {
        "$group": {
          "_id": "$clicked.elementId",
          "count": {
            "$sum": 1
          }
        }
      }
    ];
    db.collection.aggregate(pipeline)

Output

/* 0 */
{
    "result" : [ 
        {
            "_id" : 1234,
            "count" : 1
        }, 
        {
            "_id" : 1423,
            "count" : 1
        }, 
        {
            "_id" : 1122,
            "count" : 2
        }, 
        {
            "_id" : 1415,
            "count" : 3
        }, 
        {
            "_id" : 6723,
            "count" : 2
        }
    ],
    "ok" : 1
}

Converting the result to an object you require only takes the map() method of the aggregation cursor result:

var result = db.test.aggregate(pipeline)
               .map(function(doc){ return {doc["_id"]: doc["count"]} });
printjson(result);

Output:

[
    {
         6723: 2, 
         1415: 3, 
         1423: 1,
         1122: 2,
         1234: 1
    }
]

Upvotes: 0

Sede
Sede

Reputation: 61225

use collections.Counter

In [58]: import pymongo

In [59]: from collections import Counter

In [61]: conn = pymongo.MongoClient()

In [62]: db = conn.test

In [63]: col = db.collection

In [64]: result = col.aggregate([{"$group": {"_id": None, "clicked": {"$push": "$clicked"}}}]).next()['clicked']

In [65]: c = Counter()

In [66]: for el in [Counter(i.values()) for i in result]:
   ....:     c += el
   ....:     

In [67]: print(dict(c))
{1122: 2, 6723: 2, 1415: 3, 1234: 1, 1423: 1}

Upvotes: 2

Related Questions