What index to use for mongodb?

Question

I have documents that look like this:

   {
     {
       "_id": ObjectId("5444fc67931f8b040eeca671"),
       "meta": {
         "SessionID": "45",
         "configVersion": "1",
         "DeviceID": "55",
         "parentObjectID": "55",
         "nodeClass": "79",
         "dnProperty": "16"
      },
       "cfg": {
         "Name": "test"
      }
    }

The names and the data is just for testing atm. but I have a total of 25million documents in the DB. And I'm using find() to fetch a specific document(s) in this find() I use four arguments in this case, dnProperty, nodeClass, DeviceID and configVersion none of them are unique.

Atm. I have the index setup as simple as:

ensureIndex([["nodeClass", 1],["DeviceID", 1],["configVersion", 1], ["dnProperty",1]])

In other words I have index on the four arguments. I still have huge problems if you do a search that doesn't find any document at all. In my example all the "data" is random from 1-100 so if I do a find() with one of the values > 100 then it takes anywhere from 30-180sec to perform the search it also uses all of my 8gb RAM, then since there is no RAM left the computer goes very very slow.

What would be better indexes? Am I using indexes correct? Do I simply need more RAM since it will put "all" of the DB in it's working memory? Would you recommend another DB (other than mongo) to handle this better?

Sorry for multiple questions I hope they are short enough that you can give me an answer.

vmr · Accepted Answer

MongoDB uses memory mapped files which means copy of your data and indexes is stored in RAM and whenever there is a query it fetches it from the RAM itself. In the current scenario your queries are slower because your data + indexes size is so large that it will not fit in RAM , hence there will be lot of I/O activity to get data from disk which is the bottleneck.

Sharding helps in solving this problem because if you partition/shard your data across for example 5 machines then you will have 8GB * 5 = 40GB RAM which can hold your (dataset + indexes = working set) in RAM itself and the overhead of I/O will be reduced leading to improved performance.

Hence in this case your indexes will not help improve performance beyond a certain point, you will need to shard your data across multiple machines. Sharding will tend to increase the read as well as write throughput linearly. Sharding in MongoDB

What index to use for mongodb?

Answers (1)

Related Questions