Reputation: 18514
I am running a query on a very large collection (500m documents) which is timing out sometimes (6 minutes) or takes very long (3-6 minutes).
I indexed all the relevant fields (no compound index): Tag, trophies, battleLogMonitorFrequency, profileRefreshedAt are indexed and the totalIndexSize is lower than 60% of my memory (45gb indexsize of 153gb which are available for mongodb).
const oneHour: number = 1000 * 60 * 60;
const projection: {} = { tag: 1 };
const filter: {} = {
battleLogMonitorFrequency: interval,
profileRefreshedAt: { $lt: new Date(snapDate.valueOf() - interval * oneHour) }
};
const profileCursorTrophies: QueryCursor<IPlayerProfileModel> = PlayerProfile.find(filter, projection).sort({ trophies: -1 })
.limit(50000).lean().cursor();
await profileCursorTrophies.eachAsync(
(profile: IPlayerProfileModel) => {
outDatedProfileTags.push(profile.tag);
},
{ parallel: 100 }
);
My question:
Why does it take so long (that it even times out) until I get a sorted result? I was under the impression that indexing the fields I sort and filter on should be enough for that query?
Edit: Full query explain results: https://hastebin.com/ofixobasix.bash
Edit 2: Output of getIndexes(): https://hastebin.com/azayojokez.scala
Edit 3: After recommendations to use a compound index for my query, I noticed the results haven't changed at all. The query still takes a very long time to execute. See the following explain results: https://hastebin.com/ragixuqaci.bash
This index has been added:
{
"v" : 2,
"key" : {
"battleLogMonitorFrequency" : 1,
"profileRefreshedAt" : 1,
"trophies" : -1
},
"name" : "battleLogMonitorFrequency_1_profileRefreshedAt_1_trophies_-1",
"ns" : "dbname.playerprofiles",
"background" : true
}
Upvotes: 2
Views: 1068
Reputation: 3424
Here is how you went:
You created compound index {battleLogMonitorFrequency: 1, profileRefreshedAt: 1, trophies: -1}
and got into out of memory issue during sort.
errmsg: \"Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.\""
I followed below steps to solve it.
1 . Created a collection of 100m records
db.myc.count() > 100034080
My query looks like this:
db.myc.find({field1 : 1, field2: {$lt : 800}}).sort({field3 : 1})
The query should return back 38.9m records (I know that's huge but I wanted to load test it)
2. Then I created index {field1 : 1, field2: 1, field3:1}
Executed query and got Out of Memory for Sort. Here, I was able to reproduce OP's issue.
(snippet of explain)
"executionStats" : {
"executionSuccess" : false,
"errorMessage" : "Exec error resulting in state FAILURE :: caused by :: errmsg: \"Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.\"",
"errorCode" : 96,
"nReturned" : 0,
"executionTimeMillis" : 19033,
"totalKeysExamined" : 322639,
"totalDocsExamined" : 322639,
"executionStages" : {
"inputStage" : {
"inputStage" : {
"inputStage" : {
"indexName" : "field1_1_field2_1_field3_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"field1" : [ ],
"field2" : [ ],
"field3" : [ ]
},
}
}
}
}
}
3. (Solution) Changed the order of fields in index {field1 : 1, field3: 1, field2:1}
Re-executed the query and this time I got my response.
Also the totalDocsExamined
and nReturned
are actually same, which indicates that the index is used perfectly by Mongo Query Optimiser.
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 38901493,
"executionTimeMillis" : 1571781,
"totalKeysExamined" : 38902394,
"totalDocsExamined" : 38901493,
"executionStages" : {
"inputStage" : {
"indexName" : "field1_1_field3_1_field2_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"field1" : [ ],
"field3" : [ ],
"field2" : [ ]
},
}
}
}
Although, my query took very long time to execute but that is obvious as it is returning (unrealistic) 38.9m records. What I am more concerned about it did mongo use correct index? and the answer is YES.
Explanation:
The OP's query is a typical schenario of Mongo's Equality, Range, Sort
issue.
When there is an index over equality field - range field - sort field
Mongo use it only for the filter and not for sort. Hence the sort is performed in-memory.
To solve this we need to keep the range field at the end of compound index.
I found a good article to explain the scenario better.
Upvotes: 3
Reputation: 4913
According to the explain()
output, the index chosen by the mongod
is profileRefreshedAt_1
and required scanning 408295390
keys. That is ~82% of the entire collection. This large result set then requires un-indexed matching to satisfy the battleLogMonitorFrequency
clause. This means, each of the 408295390
keys returned by the IXSCAN
needs to be matched by the mongod
without the aid of an index, pulling each document off disk, into cache, to be evaluated.
Looking at the indexes available on the collection, I would suggest creating a compound index on battleLogMonitorFrequency
and profileRefreshedAt
. Depending on which field has fewer matches for your query, that should come first in the index. For example, given the following conditions:
$eq
336 has 100,000 matches$lt
new Date("2018-04-29T00:00:00.000Z") has 100,000,000 matchesWe would want to create an index of the form
battleLogMonitorFrequency:1, profileRefreshedAt:1
since querying on battleLogMonitorFrequency
first reduces the number of keys we are scanning in the profileRefreshedAt
clause.
If the conditions are the other way around, then we would reverse the key ordering in the index. See https://docs.mongodb.com/manual/tutorial/create-queries-that-ensure-selectivity/index.html for some more details.
Upvotes: 1