Shekhar
Shekhar

Reputation: 55

JavaScript heap out of memory - error while inserting into mongodb

I want to insert 1500000 documents in MongoDB. First, I query a database and get a list of 15000 instructors from there and for each instructor I want to insert 100 courses by each of them.

I run two loops: first it loops through all instructors and secondly, in each iteration it will insert 100 docs for that id as in the code below:

const instructors = await Instructor.find();
//const insrtuctor contains 15000 instructor
instructors.forEach((insructor) => {
    for(let i=0; i<=10; i++) {
        const course = new Course({
            title: faker.lorem.sentence(),
            description: faker.lorem.paragraph(),
            author: insructor._id,
            prise: Math.floor(Math.random()*11),
            isPublished: 'true',
            tags: ["java", "Nodejs", "javascript"]
        });
        course.save().then(result => {
            console.log(result._id);
            Instructor.findByIdAndUpdate(insructor._id, { $push: { courses: course._id } })
            .then(insructor => {
                console.log(`Instructor Id : ${insructor._id} add Course : ${i} `);
            }).catch(err => next(err));
            console.log(`Instructor id: ${ insructor._id } add Course: ${i}`)
        }).catch(err => console.log(err));
    }
});

Here is my package.json file where I put something I found on the internet:

{
    "scripts": {
        "start": "nodemon app.js",
        "fix-memory-limit": "cross-env LIMIT=2048 increase-memory-limit"
    },
    "devDependencies": {
        "cross-env": "^5.2.0",
        "faker": "^4.1.0",
        "increase-memory-limit": "^1.0.6",
    }
}

This is my course model definition

const mongoose = require('mongoose');

const Course = mongoose.model('courses', new mongoose.Schema({

title: {
    type: String,
    required: true,
    minlength: 3
},
author: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'instructor'
},
description: {
    type: String,
    required: true,
    minlength: 5
},
ratings: [{
    user: {
        type: mongoose.Schema.Types.ObjectId,
        ref: 'users',
        required: true,
        unique: true
    },
    rating: {
        type: Number,
        required: true,
        min: 0,
        max: 5
    },
    description: {
        type: String,
        required: true,
        minlength: 5
    }
}],
tags: [String],
rating: {
    type: Number,
    min: 0,
    default: 0
},
ratedBy: {
    type: Number,
    min: 0,
    default: 0
},
prise: {
    type: Number,
    required: function() { this.isPublished },
    min: 0
},
isPublished: {
    type: Boolean,
    default: false
}
}));

module.exports = Course;

Upvotes: 4

Views: 2463

Answers (2)

num8er
num8er

Reputation: 19372

For big amount of data You've to use cursors.

Idea is to process document asap as You get one from db.

Like You're asking db to give instructors and db sends back with small batches and You operate with that batch and process them until reach the end of all batches.

Otherwise await Instructor.find() will load all data to memory and populate that instances with mongoose methods that You don't need.

Even await Instructor.find().lean() will not give memory benefit.

Cursor is mongodb's feature when You do find on collection.

With mongoose it's accessible using: Instructor.collection.find({})

Watch this video.


Below I've written solution for batch processing data using cursor.

Add this somewhere inside the module:

const createCourseForInstructor = (instructor) => {
  const data = {
    title: faker.lorem.sentence(),
    description: faker.lorem.paragraph(),
    author: instructor._id,
    prise: Math.floor(Math.random()*11), // typo: "prise", must be: "price"
    isPublished: 'true',
    tags: ["java", "Nodejs", "javascript"]
  };
  return Course.create(data);
}

const assignCourseToInstructor = (course, instructor) => {
  const where = {_id: instructor._id};
  const operation = {$push: {courses: course._id}};
  return Instructor.collection.updateOne(where, operation, {upsert: false});
}

const processInstructor = async (instructor) => {
  let courseIds = [];
  for(let i = 0; i < 100; i++) {
    try {
      const course = await createCourseForInstructor(instructor)
      await assignCourseToInstructor(course, instructor);
      courseIds.push(course._id);
    } 
    catch (error) {
      console.error(error.message);
    }
  }
  console.log(
    'Created ', courseIds.length, 'courses for', 
    'Instructor:', instructor._id, 
    'Course ids:', courseIds
  );
};

and in Your asynchronous block replace Your loop with:

const cursor = await Instructor.collection.find({}).batchSize(1000);

while(await cursor.hasNext()) {
  const instructor = await cursor.next();
  await processInstructor(instructor);
}

P.S. I'm using native collection.find and collection.updateOne for performance to avoid mongoose use extra heap for mongoose methods and fields on model instances.

BONUS:

Even if with this cursor solution Your code will get out of memory issue again, run Your code like in this example (define size in megabytes according server's ram):

nodemon --expose-gc --max_old_space_size=10240 app.js

Upvotes: 3

trincot
trincot

Reputation: 351158

The reason is that you are not awaiting the promises returned by save, and immediately continue with the next iterations of the for and forEach loops. This means you are launching a huge amount of (pending) save operations, which will indeed grow the memory usage by the mongodb library.

It would be better to wait for a save (and the chained findByIdAndUpdate) to resolve before continuing with the next iterations.

Since you are apparently in an async function context, you can use await for this, provided that you replace the forEach loop with a for loop (so that you remain in the same function context):

async function yourFunction() {
    const instructors = await Instructor.find();
    for (let instructor of instructors) { // Use `for` loop to allow for more `await`
        for (let i=0; i<10; i++) { // You want 10 times, right?
            const course = new Course({
                title: faker.lorem.sentence(),
                description: faker.lorem.paragraph(),
                author: instructor._id,
                prise: Math.floor(Math.random()*11),
                isPublished: 'true',
                tags: ["java", "Nodejs", "javascript"]
            });
            const result = await course.save();
            console.log(result._id);
            instructor = await Instructor.findByIdAndUpdate(instructor._id, { $push: { courses: course._id } });
            console.log(`Instructor Id : ${instructor._id} add Course : ${i}`);
        }
    }
}

Now all the save operations are serialised: the next only starts when the previous has completed.

Note that I have not included the error handling you had: this should best be done with a catch call chained to the call of this async function.

Upvotes: 0

Related Questions