Reputation: 310
I'm looking in ways to improve the upsert performance of my mongoDB application. In my test program I have a 'user' collection which has an 'id' (type - Number) and a 'name' (type - string) property. There is an unique index on the 'id'.
The Problem: When performing a bulk write (ordered: false) - It seems that updateOne or replaceOne with upsert enabled is about 6 to 8 times slower than 'insertOne'.
My Index:
await getDb().collection('user').createIndex({
id: 1
}, {
unique: true,
name: "id_index"
});
Example replaceOne (Take 8.8 seconds) for 100,000 users:
operations.push({
replaceOne: {
filter: {id: 1},
replacement: {id: 1, name: "user 1"},
upsert: true
}
})
Example updateOne (Take 8.4 seconds) 100,000 users:
operations.push({
updateOne: {
filter: {id: 1},
update: {$set:{name: "user 1"}},
upsert: true
}
})
Example insertOne (Take 1.3 seconds) 100,000 users:
operations.push({
insertOne: {id: 1, name: "user 1"}
})
NOTE - each time I preformed these tests, the collection was emptied, and index was recreated.
Is that to be expected?
Is there anything else I can do to improve upsert performance? I have modified writeConcern on bulkWrite with little to no impact.
Upvotes: 1
Views: 1051
Reputation: 5090
I was 'following' this question to see what might come of it. Seeing no answers over the week, I'll offer my own.
Without further information, I think the evidence that you provided yourself is reasonably strong evidence to suggest that the answer is 'yes, this is expected'. While we don't know details such as how many updates versus inserts were performed by your test or what version of the database you are using, there doesn't seem to be anything blatantly wrong with the comparison. The upsert
documentation suggest that the database first checks for the existence of documents that would be updated by the command before performing the insert. This is further suggested by the following text a little bit lower on the same page (emphasis added):
If all
update()
operations finish the query phase before any client successfully inserts data, and there is no unique index on the name field, eachupdate()
operation may result in an insert, creating multiple documents withname: Andy
.
Based on all of this, I think it is perfectly reasonable to surmise that the update
portion of an upsert
operation has a noticeable overhead on the operation that is absent for direct insert
operations.
Given this information, I think it raises a few additional questions which are important to consider:
unique
index on id
when there is already one present for the _id
field? Certainly each new index (unique
or not) adds some overhead, so perhaps it would be best to just repurpose the required _id
field and index to use your particular needs?Upvotes: 1