timelytree
timelytree

Reputation: 21

How can I increase Mongoose/MongoDB create and update performance for a large number of entries

I've got an Express application that's using Mongoose/MongoDB and am hoping to find the most efficient way to create/update in bulk (all in a single database operation if possible?).

Users upload a CSV on the frontend that is converted to a JSON array of objects and sent to an Express backend. The array ranges anywhere from ~3000 entries to upwards of ~50,000 and is often a combination of new entries that need to be created as well as existing entries that need to be updated. Each entry is called a Deal.

Here is my current (not very performant) solution:

const deals = [
  { deal_id: '887713', foo: 'data', bar: 'data' },
  { deal_id: '922257', foo: 'data', bar: 'data' }
] // each deal contains 5 key/value pairs in the real data array
const len = deals.length
const Model = models.Deal
let created = 0
let updated = 0
let errors = 0
for (let i = 0; i < len; i++) {
  const deal = deals[i]
  const exists = await Model.findOne({ deal_id: deal.deal_id })
  if (exists) {
    exists.foo = deal.foo
    exists.bar = deal.bar
    await exists.save()
    updated += 1
  } else {
    try {
      await Model.create(deal)
      created += 1
    } catch (e) {
      errors += 1
    }
  }
}

Currently the combination of findOne/save or findOne/create is taking approximately 200-300ms for every Deal. For the low end of 3000 entries, that results in 10-15 minutes to process.

I'm not impartial to circumventing Mongoose and using Mongo directly if that helps.

If possible, I'd like maintain the ability to count the number items that were updated and created as well as the number of errors (this is sent in the response to offer users some feeling of what was successful and what failed) - but this is not critical.

Thanks in advance! :)

Upvotes: 2

Views: 861

Answers (1)

Jonathan Nielsen
Jonathan Nielsen

Reputation: 1492

You'd want to do this with as few database requests as possible. First, you can fetch all relevant documents in one find statement. https://docs.mongodb.com/manual/reference/operator/query/in/

const deals = [
  { deal_id: '887713', foo: 'data', bar: 'data' },
  { deal_id: '922257', foo: 'data', bar: 'data' }
]
const ids = deals.map(deal => deal.deal_id) // An array of all deal_id
const documents = await Model.find({ deal_id: { $in: ids }})

Now we'll make one query to update everything with the property upsert set to true. https://docs.mongodb.com/manual/reference/method/db.collection.update/ This will make sure that if the document does not already exist, then create it automatically.

By bulk updating (updating many at the same time) the most efficient approach is to bypass mongoose and use the mongodb driver directly with the command bulkWrite. https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/

const operations = deals.map(deal => {
    updateOne: {
        filter: {
            deal_id: deal.deal_id
        },
        update: {
            $set: deal
        },
        upsert: true
    }
})

const result = await Model.collection.bulkWrite(operations, { ordered: false })

Above i also set { ordered: false } which just tells MongoDB to "Just insert as fast as possible without regards to the order of the array i just gave you". It also continues to insert the rest of the documents, even if one fails. Also stated under the bulkWrite documentation page.

The result object from a bulkWrite looks like this

{
   "acknowledged" : true,
   "deletedCount" : 1,
   "insertedCount" : 2,
   "matchedCount" : 2,
   "upsertedCount" : 0,
   "insertedIds" : {
      "0" : 4,
      "1" : 5
   },
   "upsertedIds" : {

   }
}

This means that you'll get a list of how many matches you got, how many of those was updated, and which documents were created (upsertedIds). This is also stated in the documentation for bulkWrite.

A good practice on large data sets is to chunk the bulkWrite into lesser operations arrays to increase performance. A small-medium MongoDB server should be fine with a few thousand documents at the same time.

Please note that none of the code examples is tested. But the goal was to point you into the right direction and to understand some good practices. Best of luck!

Upvotes: 2

Related Questions