xx77aBs
xx77aBs

Reputation: 4768

MongoDB query over indexed field very slow

I have a collection with large number of documents (32 809 900). All documents have a field called soft_deleted. I've also created soft_deleted: 1 field. Then I've tested a few different queries related to the field. Here are my results:

Query                                Number of results  Time in milliseconds
db.cgm_egvs
  .find().count()                    32809900           90
db.cgm_egvs
  .find({soft_deleted: true})        2820897            688
  .count()
db.cgm_egvs
  .find({soft_deleted: false})       29989003           3983
  .count()
db.cgm_egvs
  .find({soft_deleted: null})        0                  42
  .count()
db.cgm_egvs
  .find({soft_deleted: {$ne: true}}) 29989003           82397
  .count()

Why are query times so different between these queries? I'd expect finding documents where soft_deleted is true or false to take the same amount of time. More importantly, why is querying by != true so much slower than any other query?

Upvotes: 6

Views: 1095

Answers (2)

Pranab Sharma
Pranab Sharma

Reputation: 739

The soft_deleted field has very low cardinality; it has only two distinct values true and false, so you will not have much benefit having an index on this field. Normally indexes perform better on fields with high cardinality.

In case of {soft_deleted: true} query, the number of rows with soft_deleted: true is very less compared to {soft_deleted: false}, and mongodb had to scan much lower number of index entries. So the {soft_deleted: true} query took less time.

Similarly the query {soft_deleted: null} took less time as the index has only 2 distinct values, and in this case much lower scanning is required.

Your final query is using $ne operator, and $ne operator is not selective (selectivity is the ability of a query to narrow results using the index). https://docs.mongodb.com/v3.0/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why. So it took much more time to execute.

Upvotes: 4

profesor79
profesor79

Reputation: 9473

I am not sure why other queries are slow (as I am awaiting explain dump), but in case $ne the point is that we have extra step added there, so that means this function is wrapped in first equal and then not equal -> see explain dump below in parsedQuery section and find not step

db.getCollection('a1').find({Level:{$ne:"Info"}}).explain()

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "logi.a1",
    "indexFilterSet" : false,
    "parsedQuery" : {
        "$not" : {
            "Level" : {
                "$eq" : "Info"
            }
        }
    },

db.getCollection('a1').find({Level:"Info"}).explain()

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "logi.a1",
    "indexFilterSet" : false,
    "parsedQuery" : {
        "Level" : {
            "$eq" : "Info"
        }
    },

Upvotes: 0

Related Questions