Reputation: 4495
Dataset
> use temp_index_issue
> db.createCollection("mycollection")
> for (let i=0; i<900000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": "India"}})};
> for (let i=900000; i<1000000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": "USA"}})};
> for (let i=1000000; i<1050000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1), "location": {"country": null}})};
> for (let i=1050000; i<1100000; i++) { db.mycollection.insertOne({"name": "Place - "+ (i+1)})};
I would like to count number of documents where location.country is "India" or field does not exists or field has null value.
> db.mycollection.countDocuments({"location.country": {"$in": ["India", null]}})
1000000 (0.493 sec)
Now if I set index on "location.country" field, it's taking much more time.
> db.mycollection.createIndex({"location.country": 1})
> db.mycollection.countDocuments({"location.country": {"$in": ["India", null]}})
1000000 (1.42 sec)
What I'm missing here?
Upvotes: 0
Views: 883
Reputation: 59455
It's a common misunderstanding that an index is always faster.
Check the statistics with db.mycollection.stats()
, here an extract:
{
"ns" : "so.mycollection",
"size" : 825.0,
"count" : 11000.0,
"storageSize" : 172.0,
"nindexes" : 2.0,
"totalIndexSize" : 164.0,
"totalSize" : 336.0,
"indexSizes" : {
"_id_" : 112.0,
"location.country_1" : 52.0
},
"scaleFactor" : 1024.0,
}
Without index Mongo has to read all documents from disk, i.e. 172 kiB of data.
With index, Mongo first reads the index from disk, in this case app. 52 kiB * (10000/11000) = 47 kiB of data and then the actual documents, in this case app. 172 kiB * (10000/11000) = 156 kiB.
So, with an index Mongo reads totally 203 kiByte data from disk compared to 172 kiByte without an index.
An index on a field with low cardinality (cardinality is the number of distinct values in relation to number of all values) usually does not improve the performance.
This is not specific for MongoDB, it applies for any database. Exception are Bitmap-Indexes, which are optimized for low-cardinality fields. According to my knowledge from the major database systems only Oracle supports Bitmap-Indexes.
Most likely this query db.mycollection.find({ "location.country": { "$in": ["USA", null] } })
will be much faster as you probably expect.
Upvotes: 2