Why does adding index worsen performance?

Question

I'm evaluating the performance of the following query.

db.products_old.find({ regularPrice: { $lte: 200 } })

The collection has a bit over a million documents, in total around 0.15GB.

No indexes

This is expected. A full column scan has to be done

"executionTimeMillis" : 1019,

"winningPlan" : {
    "stage" : "COLLSCAN",
    "filter" : {
        "regularPrice" : {
            "$lte" : 200
        }
    },
    "direction" : "forward"
},

Index { regularPrice: 1 }

"executionTimeMillis" : 2842,

"winningPlan" : {
    "stage" : "FETCH",
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
            "regularPrice" : 1
        },
        "indexName" : "regularPrice_1",
        "isMultiKey" : false,
        "multiKeyPaths" : {
            "regularPrice" : [ ]
        },
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 2,
        "direction" : "forward",
        "indexBounds" : {
            "regularPrice" : [
                "[-inf.0, 200.0]"
            ]
        }
    }
},

Now it uses the index, but the execution time is noticeably worse. Why?

Also, if it's worse performance, why doesn't Mongo use the COLLSCAN instead of using the index which slows down the execution? rejectedPlans is empty, which suggests that no other plan was even considered. Why?

Here's the full allPlansExecution output.

Pritilender · Accepted Answer

While doing COLLSCAN, MongoDB is reading from the storage drive and storing matching documents in the RAM for later use directly. On the other hand, IXSCAN reads the index which stores indexed data and pointers to their location on the storage drive. (Here's a nice visualisation from slide 6 to around slide 20)

You have a lot of documents in your collection, but you also have a lot of matching documents in your index. The data stored on the storage drive is not stored in the best way to read it from it (like it is in the index), so when the IXSCAN returns pointers to 220k+ documents it found for your query, FETCH needs to read 220k+ times from the storage drive in a random access way. Which is slow. On the other hand I assume that COLLSCAN is doing sequential read which is probably done page by page and is a lot faster than FETCH reads.

So to sum up: it's not the index that's slowing you down, it's the FETCH stage. If you want to still use this index and have a faster query execution time, then use .select('-_id regularPrice') which will just add a quick PROJECTION stage and read all necessary fields from the index. Or if you need _id, then add an index {regularPrice: 1, _id: 1}.

And regarding the part why does Mongo use index even though it could know that collection scanning is faster: well I think that if it sees an index, it will use it. But you can force it to use collection scan by using hint method with {natural: 1} passed to it.

Why does adding index worsen performance?

No indexes

Index { regularPrice: 1 }

Answers (1)

Related Questions