Reputation: 85036
So I have a MongoDB instance where I am trying to update data in one collection with data from another collection. The two collections are participants
with about 180k documents and questions
with about 95k documents.
Documents in participants
typically look something like this:
{
"_id" : ObjectId("52f90b8bbab16dd8594b82b4"),
"answers" : [
{
"_id" : ObjectId("52f90b8bbab16dd8594b82b9"),
"question_id" : 2081,
"sub_id" : null,
"values" : [
"Yes"
]
},
{
"_id" : ObjectId("52f90b8bbab16dd8594b82b8"),
"question_id" : 2082,
"sub_id" : 123,
"values" : [
"Would prefer to go alone"
]
},
{
"_id" : ObjectId("52f90b8bbab16dd8594b82b7"),
"question_id" : 2082,
"sub_id" : 456,
"values" : [
"Yes"
]
}
],
"created" : ISODate("2012-03-01T17:40:21Z"),
"email" : "anonymous",
"id" : 65,
"survey" : ObjectId("52f41d579af1ff4221399a7b"),
"survey_id" : 374
}
I am using the query below to perform the update:
db.participants.ensureIndex({"answers.question_id": 1, "answers.sub_id": 1});
print("created index for answer arrays!")
db.questions.find().forEach(function(doc){
db.participants.update(
{
"answers.question_id": doc.id,
"answers.sub_id": doc.sub_id
},
{
$set:
{
"answers.$.question": doc._id
}
},
false,
true
);
});
db.participants.dropIndex({"answers.question_id": 1, "answers.sub_id": 1});
But this takes about 20 minutes to run. I was hoping that adding the index would help with the performance, but it is still pretty slow. Is this index setup correctly considering that I am indexing fields in an array of objects? Can anyone see anything that I am doing that would cause the slowness? Suggestions on where to start looking to improve the performance of this query?
Upvotes: 0
Views: 2224
Reputation: 85036
In case anyone is interested I was able to take the run time of this update query from 20 minutes down to about a minute and a half by using projection when selecting the questions
documents. Since I am only using the _id
, id
and sub_id
fields I was able to do the following:
db.questions.find({},{_id: 1, id: 1, sub_id: 1}).forEach(function(doc){
....
Which drastically improved performance. Hope this helps someone!
Upvotes: 0
Reputation: 151072
I think you need to consider what you are actually doing here in order to understand why the index is not helping and indeed why this operation takes so long.
The first part of the answer is explained by what you are doing here:
db.questions.find()
Now that part alone basically says that you are asking to retrieve every document in your questions
collection. So we can see what you are trying to do is exactly that, as you want to update that content into your participants
collection, particularly the document _id for the "question". But here, by definition of getting all documents, no index will be used.
So what you are doing is looping every document in the questions
, then asking with your update operation to match the participants
record with data from the "question". And what that means is you are pulling "over the wire" all of your 95K documents and sending back "over the wire" your update operation, 95K times. This is not happening on the server and there is network traffic between your application and your MongoDB.
The index itself is not going to do much other than improve the search of each participants
record, which is better than scanning and you should be getting the match. But that's not the part that taking the time, its the fetching of the questions
that will be the largest issue. Also note that if you were updating
So if it's possible to run your update process on a machine that is as close as possible in networking terms to the MongoDB server then that is going to be your best performance improvement. You could also wind back your Write Concern if you want to be a little daring and/or can live with checking the integrity in another opertation, and that will reduce your network traffic and waiting for a response to the update (which is actually happening) if you put it in "fire and forget" mode.
Also see the guide if you are not sure of the concepts:
http://docs.mongodb.org/manual/core/write-concern/
Upvotes: 1