Reputation: 55
I want to insert 1500000 documents in MongoDB. First, I query a database and get a list of 15000 instructors from there and for each instructor I want to insert 100 courses by each of them.
I run two loops: first it loops through all instructors and secondly, in each iteration it will insert 100 docs for that id as in the code below:
const instructors = await Instructor.find();
//const insrtuctor contains 15000 instructor
instructors.forEach((insructor) => {
for(let i=0; i<=10; i++) {
const course = new Course({
title: faker.lorem.sentence(),
description: faker.lorem.paragraph(),
author: insructor._id,
prise: Math.floor(Math.random()*11),
isPublished: 'true',
tags: ["java", "Nodejs", "javascript"]
});
course.save().then(result => {
console.log(result._id);
Instructor.findByIdAndUpdate(insructor._id, { $push: { courses: course._id } })
.then(insructor => {
console.log(`Instructor Id : ${insructor._id} add Course : ${i} `);
}).catch(err => next(err));
console.log(`Instructor id: ${ insructor._id } add Course: ${i}`)
}).catch(err => console.log(err));
}
});
Here is my package.json
file where I put something I found on the internet:
{
"scripts": {
"start": "nodemon app.js",
"fix-memory-limit": "cross-env LIMIT=2048 increase-memory-limit"
},
"devDependencies": {
"cross-env": "^5.2.0",
"faker": "^4.1.0",
"increase-memory-limit": "^1.0.6",
}
}
This is my course model definition
const mongoose = require('mongoose');
const Course = mongoose.model('courses', new mongoose.Schema({
title: {
type: String,
required: true,
minlength: 3
},
author: {
type: mongoose.Schema.Types.ObjectId,
ref: 'instructor'
},
description: {
type: String,
required: true,
minlength: 5
},
ratings: [{
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'users',
required: true,
unique: true
},
rating: {
type: Number,
required: true,
min: 0,
max: 5
},
description: {
type: String,
required: true,
minlength: 5
}
}],
tags: [String],
rating: {
type: Number,
min: 0,
default: 0
},
ratedBy: {
type: Number,
min: 0,
default: 0
},
prise: {
type: Number,
required: function() { this.isPublished },
min: 0
},
isPublished: {
type: Boolean,
default: false
}
}));
module.exports = Course;
Upvotes: 4
Views: 2463
Reputation: 19372
For big amount of data You've to use cursors.
Idea is to process document asap as You get one from db.
Like You're asking db to give instructors and db sends back with small batches and You operate with that batch and process them until reach the end of all batches.
Otherwise await Instructor.find()
will load all data to memory and populate that instances with mongoose methods that You don't need.
Even await Instructor.find().lean()
will not give memory benefit.
Cursor is mongodb's feature when You do find
on collection.
With mongoose it's accessible using: Instructor.collection.find({})
Watch this video.
Below I've written solution for batch processing data using cursor.
Add this somewhere inside the module:
const createCourseForInstructor = (instructor) => {
const data = {
title: faker.lorem.sentence(),
description: faker.lorem.paragraph(),
author: instructor._id,
prise: Math.floor(Math.random()*11), // typo: "prise", must be: "price"
isPublished: 'true',
tags: ["java", "Nodejs", "javascript"]
};
return Course.create(data);
}
const assignCourseToInstructor = (course, instructor) => {
const where = {_id: instructor._id};
const operation = {$push: {courses: course._id}};
return Instructor.collection.updateOne(where, operation, {upsert: false});
}
const processInstructor = async (instructor) => {
let courseIds = [];
for(let i = 0; i < 100; i++) {
try {
const course = await createCourseForInstructor(instructor)
await assignCourseToInstructor(course, instructor);
courseIds.push(course._id);
}
catch (error) {
console.error(error.message);
}
}
console.log(
'Created ', courseIds.length, 'courses for',
'Instructor:', instructor._id,
'Course ids:', courseIds
);
};
and in Your asynchronous block replace Your loop with:
const cursor = await Instructor.collection.find({}).batchSize(1000);
while(await cursor.hasNext()) {
const instructor = await cursor.next();
await processInstructor(instructor);
}
P.S. I'm using native collection.find
and collection.updateOne
for performance to avoid mongoose use extra heap for mongoose methods and fields on model instances.
BONUS:
Even if with this cursor solution Your code will get out of memory issue again, run Your code like in this example (define size in megabytes according server's ram):
nodemon --expose-gc --max_old_space_size=10240 app.js
Upvotes: 3
Reputation: 351158
The reason is that you are not awaiting the promises returned by save
, and immediately continue with the next iterations of the for
and forEach
loops. This means you are launching a huge amount of (pending) save
operations, which will indeed grow the memory usage by the mongodb library.
It would be better to wait for a save
(and the chained findByIdAndUpdate
) to resolve before continuing with the next iterations.
Since you are apparently in an async
function context, you can use await
for this, provided that you replace the forEach
loop with a for
loop (so that you remain in the same function context):
async function yourFunction() {
const instructors = await Instructor.find();
for (let instructor of instructors) { // Use `for` loop to allow for more `await`
for (let i=0; i<10; i++) { // You want 10 times, right?
const course = new Course({
title: faker.lorem.sentence(),
description: faker.lorem.paragraph(),
author: instructor._id,
prise: Math.floor(Math.random()*11),
isPublished: 'true',
tags: ["java", "Nodejs", "javascript"]
});
const result = await course.save();
console.log(result._id);
instructor = await Instructor.findByIdAndUpdate(instructor._id, { $push: { courses: course._id } });
console.log(`Instructor Id : ${instructor._id} add Course : ${i}`);
}
}
}
Now all the save
operations are serialised: the next only starts when the previous has completed.
Note that I have not included the error handling you had: this should best be done with a catch
call chained to the call of this async
function.
Upvotes: 0