Reputation: 475
Please see the following post for some background: MongoDB C# Driver - Return last modified rows only
After almost two years of running this code, we've been experiencing performance problems lately and as much as I keep on saying that the code is not the issue, Infrastructure are insisting it's because I'm doing full table scans.
The thing is that the problem is environment specific. Our QA environment runs like a dream all the time but Dev and Prod are very slow at times and fine at other - it's very erratic. They have the same data and code on but Dev and Prod have another app that is also running on the database.
My data has an Id as well as an _id (or AuditId) - I group the data by Id and then return the last _id for that record where it was not deleted. We have multiple historic records for the same ID and I would like to return the last one (see original post).
So I have the following method:
private static FilterDefinition<T> ForLastAuditIds<T>(IMongoCollection<T> collection) where T : Auditable, IMongoAuditable
{
var pipeline = new[] { new BsonDocument { { "$group", new BsonDocument { { "_id", "$Id" }, { "LastAuditId", new BsonDocument { { "$max", "$_id" } } } } } } };
var lastAuditIds = collection.Aggregate<Audit>(pipeline).ToListAsync().Result.ToList().Select(_ => _.LastAuditId);
var forLastAuditIds = Builders<T>.Filter.Where(_ => lastAuditIds.Contains(_.AuditId) && _.Status != "DELETE");
return forLastAuditIds;
}
This method is called by the one below, which accepts an Expression that it appends to the FilterDefinition created by ForLastAuditIds.
protected List<T> GetLatest<T>(IMongoCollection<T> collection,
Expression<Func<T, bool>> filter, ProjectionDefinition<T, T> projection = null,
bool disableRoleCheck = false) where T : Auditable, IMongoAuditable
{
var forLastAuditIds = ForLastAuditIds(collection);
var limitedList = (
projection != null
? collection.Find(forLastAuditIds & filter, new FindOptions()).Project(projection)
: collection.Find(forLastAuditIds & filter, new FindOptions())
).ToListAsync().Result.ToList();
return limitedList;
}
Now, all of this works really well and is re-used by all of my code that calls Collections, but this specific collection is a lot bigger than the others and we are getting slowdowns just on that one.
My question is: Is there a way for me to take the aggregate and Filter Builder and combine them to return a single FilterDefinition that I could use without running the full table scan first?
I really hope I am making sense.
Upvotes: 1
Views: 3979
Reputation: 10918
Assuming I fully understand what you want, this should be as easy as this:
First, put a descending index on the LastAuditId
field:
db.collection.createIndex{ "LastAuditId": -1 /* for sorting */ }
Or even extend the index to cover for other fields that you have in your filter:
db.collection.createIndex{ "Status": 1, "LastAuditId": -1 /* for sorting */ }
Make sure, however, that you understand how indexes can/cannot support certain queries. And always use explain() to see what's really going on.
The next step is to realize that you must always filter as much as possible as the very first step to reduce the amount of sorting required.
So, if you need to e.g. filter by Name
then by all means do it as the very first step if your business requirements permit it. Be careful, however, that filtering at the start changes your semantics in the sense that you will get the last modified documents per each Id
that passed the preceeding $match
stage as opposed to the last documents per each Id
that happen to also pass the following $match
stage.
Anyway, most importantly, once you've got a sorted set, you can easily and quickly get the latest full document by using $group
with $first
which - with the right index in place - will not do a collection scan anymore (it'll be an index scan for now and hence way faster).
Finally, you want to run the equivalent of the following MongoDB query through C# leveraging the $$ROOT variable in order to avoid a second query (I can put the required code together for you once you post your Audit
, Auditable
and IMongoAuditable
types as well as any potential serializers/conventions):
db.getCollection('collection').aggregate({
$match: {
/* some criteria that you currently get in the "Expression<Func<BsonDocument, bool>> filter" */
}
}, {
$sort: {
"ModifiedDate": -1 // this will use the index!
}
}, {
$group: {
"_id": "$Id",
"document": { $first: "$$ROOT" } // no need to do a separate subsequent query or a $max/$min across the entire group because we're sorted!
}
}, {
$match: { // some additional filtering depending on your needs
"document.Status": { $ne: "Delete" }
}
})
Lastly, kindly note that it might be a good idea to move to the latest version of MongoDB because they are currently putting a lot of effort into optimizing aggregation cases like yours, e.g. this one: https://jira.mongodb.org/browse/SERVER-9507
Upvotes: 1