proGrammar
proGrammar

Reputation: 39

MongoDB indexing on variable query

I have a collection of user generated posts. They contain the following fields

_id: String
groupId: String // id of the group this was posted in
authorId: String
tagIds: [String]
latestActivity: Date // updated whenever someone comments on this post
createdAt: Date
numberOfVotes: Number
...some more...

My queries always look something like this...

Posts.find({
  groupId: {$in: [...]},
  authorId: 'xyz', // only SOMETIMES included
  tagIds: {$in: [...]}, // only SOMETIMES included
}, {
  sort: {latestActivity/createdAt/numberOfVotes: +1/-1, _id: -1}
});

So I'm always querying on the groupId, but only sometimes adding tagIds or userIds. I'm also switching out the field on which this is sorted. How would my best indexing strategy look like?

From what I've read so far here on SO, I would probably create multiple compound indices and have them always start with {groupId: 1, _id: -1} - because they are included in every query, they are good prefix candidates. Now, I'm guessing that creating a new index for every possible combination wouldn't be a good idea memory wise. Therefore, should I just keep it like that and only index groupId and _id?

Thanks.

Upvotes: 1

Views: 442

Answers (1)

Onosa
Onosa

Reputation: 1273

You are going in the right direction. With compound indexes, you want the most selective indexes on the left and the ranges on the right. {groupId: 1, _id: -1} satisfies this.

It's also important to remember that compound indexes are used when the keys are in the query from left to right. So, one compound index can cover many common scenarios. If, for example, your index was {groupId: 1, authorId:1, tagIds: 1} and your query was Posts.find({groupId: {$in: [...]},authorId: 'xyz'}), that index would get used (even though tagIds was absent). Also, Posts.find({groupId: {$in: [...]},tagIds: {$in: [...]}}) would use this index (the first and third field of the index was used, so if there isn't a more specific index found by Mongo, this index would be used) . However, Posts.find({authorId: 'xyz',tagIds: {$in: [...]}}) would not use the index because the first field in the index was missing.

Given all of that, I would suggest starting with {groupId: 1, authorId:1, tagIds: 1, _id: -1}. groupId is the only non-optional field in your queries, so it goes on the left before the optional ones. It looks like authorId is more selective than tagIds, so it should go on the left after groupId. You're sorting by _id so that should go on the right. Be sure to Analyze Query performance on the different ways you query the data. Make sure they are all choosing this index (otherwise you'll need to make more tweaks or possibly a second compound index). You could then make other indexes and force the query to use it to do some a-b testing on performance.

Upvotes: 1

Related Questions