Adam Edney
Adam Edney

Reputation: 310

Mongo DB - Slow upsert performance

I'm looking in ways to improve the upsert performance of my mongoDB application. In my test program I have a 'user' collection which has an 'id' (type - Number) and a 'name' (type - string) property. There is an unique index on the 'id'.

The Problem: When performing a bulk write (ordered: false) - It seems that updateOne or replaceOne with upsert enabled is about 6 to 8 times slower than 'insertOne'.

My Index:

await getDb().collection('user').createIndex({
            id: 1
        }, {
            unique: true,
            name: "id_index"
        });

Example replaceOne (Take 8.8 seconds) for 100,000 users:

operations.push({
            replaceOne: {
                filter: {id: 1},
                replacement: {id: 1, name: "user 1"},
                upsert: true
            }
    })

Example updateOne (Take 8.4 seconds) 100,000 users:

  operations.push({
            updateOne: {
                filter: {id: 1},
                update: {$set:{name: "user 1"}},
                upsert: true
            }
        })

Example insertOne (Take 1.3 seconds) 100,000 users:

  operations.push({
            insertOne: {id: 1, name: "user 1"}
        })

NOTE - each time I preformed these tests, the collection was emptied, and index was recreated.

Is that to be expected?

Is there anything else I can do to improve upsert performance? I have modified writeConcern on bulkWrite with little to no impact.

Upvotes: 1

Views: 1051

Answers (1)

user20042973
user20042973

Reputation: 5090

I was 'following' this question to see what might come of it. Seeing no answers over the week, I'll offer my own.

Without further information, I think the evidence that you provided yourself is reasonably strong evidence to suggest that the answer is 'yes, this is expected'. While we don't know details such as how many updates versus inserts were performed by your test or what version of the database you are using, there doesn't seem to be anything blatantly wrong with the comparison. The upsert documentation suggest that the database first checks for the existence of documents that would be updated by the command before performing the insert. This is further suggested by the following text a little bit lower on the same page (emphasis added):

If all update() operations finish the query phase before any client successfully inserts data, and there is no unique index on the name field, each update() operation may result in an insert, creating multiple documents with name: Andy.

Based on all of this, I think it is perfectly reasonable to surmise that the update portion of an upsert operation has a noticeable overhead on the operation that is absent for direct insert operations.

Given this information, I think it raises a few additional questions which are important to consider:

  1. What was your goal in knowing this information? Just to make sure you had configured things optimally, or were you not currently achieving some performance targets?
  2. Depending on a variety of factors, perhaps an alternative approach here would be just attempt the insert (or update) and deal with the exceptions separately afterwards?
  3. Perhaps out of curiosity, what's the purpose of having a separate unique index on id when there is already one present for the _id field? Certainly each new index (unique or not) adds some overhead, so perhaps it would be best to just repurpose the required _id field and index to use your particular needs?

Upvotes: 1

Related Questions