Indexes with MongoDB Java Driver not improving performance

Question

In my database, i have an index with 100K documents with the following structure:

{ "_id" : ObjectId("56f2ce94ef4c3043f12141b8"), "a" : "aaaa", "b" : "bbbb", "c" : "cccc" ...}

On Java, after inserting, i call the function:

myCollection.createIndex(new Document("a", 1));

and in order to query:

 FindIterable iterable = 
DB.getCollection(collection).find(dbobj);

After several tests, the performance with or without query is the same. I'm open to give extra information about my operations.
The explain command gives me:

 {
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "db.MyCollection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "a" : /^aaaa.*/i
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "filter" : {
                    "a" : /^aaaa.*/i
                },
                "keyPattern" : {
                    "a" : 1
                },
                "indexName" : "a_1",
                "isMultiKey" : false,
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "forward",
                "indexBounds" : {
                    "Modality" : [
                        "[\"\", {})",
                        "[/^aaaa.*/i, /^aaaa.*/i]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "ok" : 1
}

Cydrick Trudel · Accepted Answer

As stated in the comments of the question, MongoDB gets slow when all the documents does not fit in memory, and it gets really slow when the indexed fields do not fit in memory. This is because MongoDB has to resort to memory paging. This means MongoDB saves some content of the memory on your HDD, and retrieving this is slow. Basically, you are losing the advantage of having indexed fields.

Tactics to avoid that are to:

Increase the amount of RAM you have on your server
Use a sharded configuration containing multiple servers
Limit data duplication across documents
Limit the indexed fields

You can observe the amount of memory taken by MongoDB by using the db.my_collection.stats() command on the MongoDB console. This should be the output:

{
   "ns" : "guidebook.restaurants",
   "count" : 25359,
   "size" : 10630398,
   "avgObjSize" : 419,
   "storageSize" : 4104192
   "capped" : false,
   "wiredTiger" : {
         "metadata" : {
            "formatVersion" : 1
         },
         [...]
      "nindexes" : 4,
      "totalIndexSize" : 626688,
      "indexSizes" : {
         "_id_" : 217088,
         "borough_1_cuisine_1" : 139264,
         "cuisine_1" : 131072,
         "borough_1_address.zipcode_1" : 139264
      },
      "ok" : 1
 }

storageSize shows the amount of memory used to store documents in bytes, and totalIndexSize shows the amount of memory used to store indexed values in bytes. You can see which indexed fields takes most of the space in the indexSizes sub-document.

Ideally, you want to have more RAM than storageSize + totalIndexSize, but you really should have more RAM than totalIndexSize.

Indexes with MongoDB Java Driver not improving performance

Answers (2)

Related Questions