MongoDB mapReduce() query to aggregate data into a list inside a record

Question

I have a MongoDB collection with records of the form:

{
    "_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"),
    "slug" : "1:955553",
    "subslug" : "1:955553:02",
    "score" : "0.615",
    "position_start" : "1",
    "position_end" : 955553,
    "name" : "AGRN",
    "ref" : "A"
},
{
    "_id" : ObjectId("562d6d9c3a2e9c0adbb02f15"),
    "slug" : "2:15553",
    "subslug" : "2:15553:01",
    "score" : "0.915",
    "position_start" : "1002",
    "position_end" : 15553,
    "name" : "MMFR",
    "ref" : "C"
}
{
    "_id" : ObjectId("562d6d9c3a2e9c0adbb02f16"),
    "slug" : "1:955553",
    "subslug" : "1:955553:01",
    "score" : "0.715",
    "position_start" : 1,
    "position_end" : 955553,
    "name" : "AGRN",
    "ref" : "A"
},

I would like to aggregate this collection and group by slug (notice the first and third records here have the same slug).

I am trying to aggregate my data into a new collection that looks like:

{
    "_id" : "",
    "slug" : "1:955553",
    "components" : [ 
       { 
         "subslug": "1:955553:01",
         "score": 0.615,
         "position_start": 1,
         "position_end": 955553,
         "name": AGRN,
         "ref": "A"
       },
       { 
         "subslug": "1:955553:02",
         "score": 0.715,
         "position_start": 1,
         "position_end": 955553,
         "name": AGRN,
         "ref": "A"
       },

    ]
},
{
    "_id" : "",
    "slug" : "2:15553",
    "components" : [ 
       { 
         "subslug": "2:15553:01",
         "score": 0.915,
         "position_start": 1002,
         "position_end": 15553,
         "name": MMFR,
         "ref": "C"
       }
    ]
}

I am trying to use mapReduce() to accomplish this, but I am having some difficulties working out the details of the functions.

I have the following query:

db.getCollection('vest').mapReduce(
    function() {
        emit(this.slug, { 'components': {$push: this } });
    },
    function(key, components) {
        return components[0];
    },
    {
        out: 'mytable'
    }
)

Yet this unfortunately builds a table that looks like:

{
    "_id" : "1:955553",
    "value" : {
        "components" : {
            "$push" : {
                "_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"),
                "slug" : "1:955553",
                "subslug" : "1:955553:01",
                "position_start" : 1,
                "position_end" : 955553,
                "gene" : "AGRN",
                "ref" : "A"
            }
        }
    }
}

And this is not what I need. I was trying to use $push to append a components array, but clearly $push is not respected in mapReduce().

Can anyone give me any pointers on how to take the input collection data above and create the desired output collection? Am I on the right track with my mapReduce() query?

chridam · Accepted Answer

Best to use the aggregation framework for such operation which should be multiple times faster than the map-reduce operation.

Typically you would construct an aggregation pipeline that consists of 3 stages:

$group stage - This pipeline step groups the documents by the slug field as its key, then apply the accumulator operator $push to create the components array that is a result of applying an expression to each document in the above group.
$project stage - This will reshape each document in the stream, such as by adding new fields or removing existing fields.
$out stage - This final step will writes the resulting documents of the aggregation pipeline to a new collection.

So, using the above operation, running the following aggregation pipeline will give you the desired result in a new collection called mytable:

db.vest.aggregate([
    {
        "$group": {
            "_id": "$slug",
            "components": {
                "$push": {
                    "subslug": "$subslug",
                     "score": "$score",
                     "position_start": "$position_start",
                     "position_end": "$position_end",
                     "name": "$name",
                     "ref": "$ref"
                }
            }
        }
    },
    {
        "$project": {
            "_id": 0, "slug": "$_id", "components": 1
        }
    },
    { "$out": "mytable" }
])

Querying this collection with the above sample data

db.mytable.find()

will give you the desired output:

Sample Output:

/* 0 */
{
    "_id" : ObjectId("563bc608d1f71f49c3d6c80b"),
    "components" : [ 
        {
            "subslug" : "2:15553:01",
            "score" : "0.915",
            "position_start" : "1002",
            "position_end" : 15553,
            "name" : "MMFR",
            "ref" : "C"
        }
    ],
    "slug" : "2:15553"
}

/* 1 */
{
    "_id" : ObjectId("563bc608d1f71f49c3d6c80c"),
    "components" : [ 
        {
            "subslug" : "1:955553:02",
            "score" : "0.615",
            "position_start" : "1",
            "position_end" : 955553,
            "name" : "AGRN",
            "ref" : "A"
        }, 
        {
            "subslug" : "1:955553:01",
            "score" : "0.715",
            "position_start" : 1,
            "position_end" : 955553,
            "name" : "AGRN",
            "ref" : "A"
        }
    ],
    "slug" : "1:955553"
}

MongoDB mapReduce() query to aggregate data into a list inside a record

Answers (2)

Related Questions