mark86v1
mark86v1

Reputation: 312

What is the use of MongoDB indexing?

I have a mongo collection with millions of documents having same fields, For example,

{
    "_id" : ObjectId("601ade833126047ee8f47182"),
    "file_id" : "60110b7dad0cf20001adcbef",
    "versions" : [
        {
            "local" : 6,
            "s3" : "C71rczduuVOPpMohCpCeBQ3_NARDnTRj"
        }
    ]
}
{
    "_id" : ObjectId("60221d1039acf39e09fbfca5"),
    "file__id" : "5fdb2eb4ad0cf20001f97856",
    "versions" : [
        {
            "local" : 2,
            "s3" : "aCy61Gx_UpTZfY59hNLYryGuWTJO2oPk"
        }
    ]
}
{
    "_id" : ObjectId("60221dc639acf39e09fbfca6"),
    "file_id" : "5fe9c897a675f20001f0a82e",
    "versions" : [
        {
            "local" : 3,
            "s3" : "PHLnYjsRlg3GnEQ_UeDkhWIaJbFRmpw9"
        }
    ]
}
{
    "_id" : ObjectId("6050cbcd6b7aab2cd3958978"),
    "file_id" : "6040ca06a675f2000115985e",
    "versions" : [
        {
            "local" : 2,
            "s3" : "vdFY22JFAzU.cD1Xr0eliuwt00rpJC8j"
        }
    ]
}

My question is, if I give the command collection.find({"file_id": some_string}), mongodb has to search the whole collection to find the document with "file_id" which I am searching for. Will Indexing "file_id" help to reduce the execution time?. In my case all the documents inside the collection will have the key "file_id". Will indexing really help in this case?.

Upvotes: 0

Views: 303

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522712

You asked:

Will Indexing "file_id" help to reduce the execution time?

The answer is that, quite possibly, yes, adding an index to the file_id field will dramatically speed up the find query you showed above. Just try it yourself to find out:

db.your_collection.createIndex( { "file_id": 1 } )

The above command will, by default, create a B-tree index using the file_id field values. Going into depth about how a B-tree works might be out of scope for any single answer, but in summary if Mongo uses this index to search by file_id it should perform as O(lgN), where N is the number of BSON documents in your collection. On the other hand, running your query as-is, without any index, should result in a full collection scan, which should be a linear O(N) operation. Note that this is exponentially slower than using the index.

Upvotes: 1

Related Questions