none
none

Reputation: 1757

MongoDB Compound Index

I need compound index for my collection, but I'm not sure about keys order

My item:

 {
     _id,
     location: {
        type:       "Point",
        coordinates: [<lng>, <lat>]
     },
     isActive: true,
     till:  ISODate("2016-12-29T22:00:00.000Z"),
     createdAt : ISODate("2016-10-31T12:02:51.072Z"),

     ...

 }

My main query is:

 db.collection.find({
    $and: [
        {
            isActive: true
        }, {
            'till': {
                $gte: new Date()
            }
        },
        {
            'location': { $geoWithin: { $box: [ [ SWLng,SWLat], [ NELng, NELat] ] }  }
        }

    ]
 }).sort({'createdAt': -1 })

In human, I need all active items on visible part of my map that are not expired, newly added - first.

Is it normal to create this index:

  db.collection.createIndex( { "isActive": 1, "till": -1, "location": "2dsphere", "createdAt": -1 } )

And what is the best order for performance, for disk usage? Or maybe I have to create several indexes...

Thank you!

Upvotes: 2

Views: 651

Answers (2)

love gupta
love gupta

Reputation: 529

In Mongo, many things depend upon data and its access patterns. There are few things to consider while creating index on your collection-

  1. How the data will be accessed from application. (You already know the main query so this part is almost done)
  2. The data size and cardinality and data span.
  3. Operations on the data. (how often reads and writes will happen and in what pattern)
  4. A particular query can use only one index at a time.
  5. Index usage is not static. Mongo keeps changing index used by heuristics and it tries to do it in optimized way. So if you see index1 being used at soem time, it may happen that mongo uses index2 after some time when some/enough different type/cardinality of data is entered.

Indices can be good and worse as well for your application performance. It is best to test them via shell/compass before using them in production.

var ex = db.<collection>.explain("executionStats")

Above line when entered in mongo shell gives you the cursor on explainable object which can be used further to check performance issues.

ex.find(<Your query>).sort(<sort predicate>)

Points to note in above output are

  • "executionTimeMillis"
  • "totalKeysExamined"
  • "totalDocsExamined"
  • "stage"
  • "nReturned"

We strive for minimum for first three items (executionTimeMillis, totalKeysExamined and totalDocsExamined) and "stage" is one important thing to tell what is happening. If Stage is "COLLSCAN" then it means it is looking for every document to fulfil the query, if Stage is "SORT" then it means it is doing in-memory sorting. Both are not good.

Coming to your query, there are few things to consider-

  1. If "till" is going to have a fix value like End of month date for all the items entered during a month then it's not a good idea to have index on it. DB will have to scan many documents even after this index. Moreover there will be only 12 entries for this index for a year given it is month end date.
  2. If "till" is a fix value after "createdAt" then it is not good to have index on both.
  3. Indexing "isActive" is not good because there are only two values it can take.

So please try with the actual data and execute below indices and determine which index should fit considering time, no. of docs. examined etc.

1. {"location": "2dsphere" , "createdAt": -1}
2. {"till":1, "location": "2dsphere" , "createdAt": -1}

Apply both indices on collection and execute ex.find().sort() where ex is explainable cursor. Then you need to analyze both outputs and compare to decide best.

Upvotes: 2

Oleks
Oleks

Reputation: 1633

The order of fields in an index should be:

  1. fields on which you will query for exact values.
  2. fields on which you will sort.
  3. fields on which you will query for a range of values.

In your case it would be:

db.collection.createIndex( { "isActive": 1, "createdAt": -1, "till": -1, "location": "2dsphere"  } )

However, indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. So I would advise you to do the following:

  1. duplicate the collection for test purposes
  2. create index, you would like to test (i.e. {"isActive": 1, "createdAt": -1, "till": -1, "location": "2dsphere" })
  3. in the mongo shell create variable:

    var exp = db.testCollection.explain('executionStats')

  4. execute the following query exp.find({'you query'}) it will return statistics describing the execution of the winning plan

  5. analyze the keys like: "nReturned", "totalKeysExamined","totalDocsExamined"
  6. drop the index, create new one (i.e. {"createdAt": -1, "till": -1, "location": "2dsphere"}), execute exp.find({'you query'}) compare the result to the previous one

Upvotes: 4

Related Questions