kentor
kentor

Reputation: 18514

MongoDB query timing out even though I indexed the fields

I am running a query on a very large collection (500m documents) which is timing out sometimes (6 minutes) or takes very long (3-6 minutes).

I indexed all the relevant fields (no compound index): Tag, trophies, battleLogMonitorFrequency, profileRefreshedAt are indexed and the totalIndexSize is lower than 60% of my memory (45gb indexsize of 153gb which are available for mongodb).

const oneHour: number = 1000 * 60 * 60;
      const projection: {} = { tag: 1 };
      const filter: {} = {
        battleLogMonitorFrequency: interval,
        profileRefreshedAt: { $lt: new Date(snapDate.valueOf() - interval * oneHour) }
      };
const profileCursorTrophies: QueryCursor<IPlayerProfileModel> = PlayerProfile.find(filter, projection).sort({ trophies: -1 })
        .limit(50000).lean().cursor();
      await profileCursorTrophies.eachAsync(
        (profile: IPlayerProfileModel) => {
          outDatedProfileTags.push(profile.tag);
        },
        { parallel: 100 }
      );

My question:

Why does it take so long (that it even times out) until I get a sorted result? I was under the impression that indexing the fields I sort and filter on should be enough for that query?

Edit: Full query explain results: https://hastebin.com/ofixobasix.bash

Edit 2: Output of getIndexes(): https://hastebin.com/azayojokez.scala

Edit 3: After recommendations to use a compound index for my query, I noticed the results haven't changed at all. The query still takes a very long time to execute. See the following explain results: https://hastebin.com/ragixuqaci.bash

This index has been added:

    {
            "v" : 2,
            "key" : {
                    "battleLogMonitorFrequency" : 1,
                    "profileRefreshedAt" : 1,
                    "trophies" : -1
            },
            "name" : "battleLogMonitorFrequency_1_profileRefreshedAt_1_trophies_-1",
            "ns" : "dbname.playerprofiles",
            "background" : true
    }

Upvotes: 2

Views: 1068

Answers (2)

Amit Phaltankar
Amit Phaltankar

Reputation: 3424

Here is how you went:

You created compound index {battleLogMonitorFrequency: 1, profileRefreshedAt: 1, trophies: -1} and got into out of memory issue during sort.

errmsg: \"Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.\""

I followed below steps to solve it.


1 . Created a collection of 100m records

db.myc.count()  > 100034080

My query looks like this:

db.myc.find({field1  : 1, field2: {$lt : 800}}).sort({field3 : 1})

The query should return back 38.9m records (I know that's huge but I wanted to load test it)


2. Then I created index {field1 : 1, field2: 1, field3:1}

Executed query and got Out of Memory for Sort. Here, I was able to reproduce OP's issue.

(snippet of explain)

"executionStats" : {
        "executionSuccess" : false,
        "errorMessage" : "Exec error resulting in state FAILURE :: caused by :: errmsg: \"Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.\"",
        "errorCode" : 96,
        "nReturned" : 0,
        "executionTimeMillis" : 19033,
        "totalKeysExamined" : 322639,
        "totalDocsExamined" : 322639,
        "executionStages" : {

            "inputStage" : {

                "inputStage" : {

                    "inputStage" : {

                        "indexName" : "field1_1_field2_1_field3_1",
                        "isMultiKey" : false,
                        "multiKeyPaths" : {
                            "field1" : [ ],
                            "field2" : [ ],
                            "field3" : [ ]
                        },

                    }
                }
            }
        }
    }

3. (Solution) Changed the order of fields in index {field1 : 1, field3: 1, field2:1} Re-executed the query and this time I got my response. Also the totalDocsExamined and nReturned are actually same, which indicates that the index is used perfectly by Mongo Query Optimiser.

"executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 38901493,
        "executionTimeMillis" : 1571781,
        "totalKeysExamined" : 38902394,
        "totalDocsExamined" : 38901493,
        "executionStages" : {

            "inputStage" : {

                "indexName" : "field1_1_field3_1_field2_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "field1" : [ ],
                    "field3" : [ ],
                    "field2" : [ ]
                },

            }
        }
    }

Although, my query took very long time to execute but that is obvious as it is returning (unrealistic) 38.9m records. What I am more concerned about it did mongo use correct index? and the answer is YES.


Explanation: The OP's query is a typical schenario of Mongo's Equality, Range, Sort issue. When there is an index over equality field - range field - sort field Mongo use it only for the filter and not for sort. Hence the sort is performed in-memory. To solve this we need to keep the range field at the end of compound index.

I found a good article to explain the scenario better.

Upvotes: 3

Pete Garafano
Pete Garafano

Reputation: 4913

According to the explain() output, the index chosen by the mongod is profileRefreshedAt_1 and required scanning 408295390 keys. That is ~82% of the entire collection. This large result set then requires un-indexed matching to satisfy the battleLogMonitorFrequency clause. This means, each of the 408295390 keys returned by the IXSCAN needs to be matched by the mongod without the aid of an index, pulling each document off disk, into cache, to be evaluated.

Looking at the indexes available on the collection, I would suggest creating a compound index on battleLogMonitorFrequency and profileRefreshedAt. Depending on which field has fewer matches for your query, that should come first in the index. For example, given the following conditions:

  • battleLogMonitorFrequency $eq 336 has 100,000 matches
  • profileRefreshedAt $lt new Date("2018-04-29T00:00:00.000Z") has 100,000,000 matches

We would want to create an index of the form battleLogMonitorFrequency:1, profileRefreshedAt:1 since querying on battleLogMonitorFrequency first reduces the number of keys we are scanning in the profileRefreshedAt clause.

If the conditions are the other way around, then we would reverse the key ordering in the index. See https://docs.mongodb.com/manual/tutorial/create-queries-that-ensure-selectivity/index.html for some more details.

Upvotes: 1

Related Questions