megamoth
megamoth

Reputation: 693

Mongodb MapReduce performance using Indexes

I have a sample document in mongodb(and I am still new to mongodb)

{
    "ID": 0,
    "Facet1":"Value1",
    "Facet2":[
        {
            "Facet2Obj1":{
                "Obj1Facet1":"Value11",
                "Obj2Facet1":"Value21",
                "Obj3Facet1":"Value31"
            }   
        },
        {
            "Facet2Obj2":{
                "Obj1Facet2":"Value12",
                "Obj2Facet2":"Value22",
                "Obj3Facet2":"Value32"
            }
        },
        {
            "Facet2Obj3":{
                "Obj1Facet3":"Value13",
                "Obj2Facet3":"Value23",
                "Obj3Facet3":"Value33"
            }
        }
    ],
    "Facet3":"Value3"
    "Facet4":{
        "Facet4Obj1":{
            "Obj1Facet1":"Value4111"
        }
    }
}

The Mapreduce is a little bit complex and it gives the following ouput(for 30,000 documents):

{
    "_id" : "Facet1",
    "value" : [
        {
            "value" : "Value1",
            "count" : 30000,
            "ID" : [
                0,
                1,
            .
                .
                .
            ]
        }
    ]
}
{
    "_id" : "ID",
    "value" : [
        {
            "value" : 0,
            "count" : 1,
            "ID" : [
                0
            ]
        },
        {
            "value" : 1,
            "count" : 1,
            "ID" : [
                1
            ]
        },
        .
        .
        .
    ]
}
{
    "_id" : "Facet2",
    "value" : [
        {
            "value" : "Facet2Obj1",
            "count" : 30000,
            "ID" : [
                0,
                1,
                .
                .
                .
            ]
        },
        {
            "value" : "Facet2Obj2",
            "count" : 30000,
            "ID" : [
                0,
                1,
                .
                .
                .
            ]
        },
        {
            "value" : "Facet2Obj3",
            "count" : 30000,
            "ID" : [
                0,
                1,
                .
                .
                .
            ]
        }
    ]
}
{
    "_id" : "Facet3",
    "value" : [
    {
            "value" : "Value3",
        "count" : 30000,
            "ID" : [
                0,
                1,
                2,
                .
                .
                .
            ]
        }
    ]
} 
{
    "_id" : "Facet4",
    "value" : [
        {
            "value" : "Facet4Obj1",
            "count" : 30000,
            "ID" : [
                0,
                1,
                2,
                .
                .
                .
            ]
        }
    ]
}

I inserted 30,000 documents using the format(with different IDs) into the mongodb, Then I did a map-reduce,but it was slow. With 30,000 documents it will take about 30 minutes , but then I put indexes with the facets it became faster a little bit, like it would take 350 seconds but with 50,000 documents it took again about 30 minutes. When I check the indexes using db.collection.getIndexes() mongodb will return this output:

{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "ns" : "database.collection",
    "name" : "_id_"
},
{
    "v" : 1,
    "key" : {
        "ID" : 1,
        "Facet1" : 1,
        "Facet2" : 1,
        "Facet3" : 1,
        "Facet4" : 1
    },
    "ns" : "database.collection",
    "name" : "ID_1_Facet1_1_Facet2_1_Facet3_1_Facet4_1"
}

Is there anything I did wrong with the indexes that the map-reduce is still not fast enough because Indexes must be strategically place or performance output will be the opposite

Answers are greatly appreciated and thanks in advance

Upvotes: 2

Views: 2858

Answers (1)

Asya Kamsky
Asya Kamsky

Reputation: 42342

MapReduce passes every document in a collection into the map function except if you pass it {query: } option which it will use to "pre"-filter documents sent to MapReduce. You can also pass a {sort:} option to mapReduce and it will send documents to map function sorted on that field(s).

That's the only two places where indexes will be used - after that everything happens in the Javascript thread that's spawned for the work.

Upvotes: 5

Related Questions