Pymongo count elements collected out of all documents with key

Question

I want to count all elements which occur in somekey in an MongoDB collection.

The current code looks at all elements in somekey as a whole.

from pymongo import Connection

con = Connection()
db = con.database

collection = db.collection

from bson.code import Code
reducer = Code("""
  function(obj, prev){
  prev.count++;
  }
  """)

from bson.son import SON
results = collection.group(key={"somekey":1}, condition={}, initial={"count": 0}, reduce=reducer)
for doc in results:
  print doc

However, I want that it counts all elements which occur in any document with somekey.

Here is an anticipated example. The MongoDB has the following documents.

{ "_id" : 1, “somekey" : [“AB", “CD"], "someotherkey" : "X" }
{ "_id" : 2, “somekey" : [“AB", “XY”], "someotherkey" : "Y" }

The result should provide an by count ordered list with:

count: 2 "AB"
count: 1 "CD"
count: 1 "XY"

Neil Lunn · Accepted Answer

The .group() method will not work on elements that are arrays, and the closest similar thing would be mapReduce where you have more control over the emitted keys.

But really the better fit here is the aggregation framework. It is implemented in native code as does not use JavaScript interpreter processing as the other methods there do.

You wont be getting an "ordered list" from MongoDB responses, but you get a similar document result:

results = collection.aggregate([
    # Unwind the array
    { "$unwind": "somekey" },

    # Group the results and count
    { "$group": {
        "_id": "$somekey",
        "count": { "$sum": 1 }
    }}
])

Gives you something like:

{ "_id": "AB", "count": 2 }
{ "_id": "CD", "count": 1 }
{ "_id": "XY", "count": 1 }

Pymongo count elements collected out of all documents with key

Answers (1)

Related Questions