Reputation: 150
So, here's my problem. I have two collections (coll1, coll2) full of about 1.5 millions of documents with the same fields. They have more than 95% of docs in common but some coll1 docs have email filled not null and coll2 have more documents.
The final collections I want is the coll2 but with the emails of coll1.
Here how I am doing :
const options = {
socketTimeoutMS: 0,
keepAlive: true,
reconnectTries: 30,
};
mongoose.connect(`mongodb://localhost:27017/coll1`, options);
const Coll1Model = mongoose.model(coll, collSchema);
Coll1Model.find({ email: { $ne: '' } })
.select({ id: 1, email: 1, _id: 0 })
.then((result) => {
const Coll2Model = mongoose.model(coll2, collSchema);
const bulk = Coll2Model.collection.initializeUnorderedBulkOp();
// c is about 390k
const c = result.length;
for (let i = 0; i < c; i += 1) {
bulk.find({ id: result[i].id }).updateOne({ $set: { email: result[i].email } });
}
bulk
.execute()
.then((result) => {
console.log(result);
console.log('End', new Date());
})
.catch((err) => {
console.log(err);
console.log('End', new Date());
});
})
.catch((err) => {
console.log('Error', err);
});
The problem I have with this is that it is way too long and way too resource consuming (about 1h30 for 20% with the cpu between 60 and 80%)
I am far of an expert about MongoDB and mongoose so if someone has an idea to do that a better way I would be happy.
Upvotes: 1
Views: 1142
Reputation: 150
I manage to reduce the time from 4-5 hours to like 2-3 minutes with indexes.
db.coll2.createIndex({id: 1}, function(err, col) {
var bulk = db.coll2.initializeUnorderedBulkOp();
db.coll1.find({ email: { $ne: '' } }).forEach(function(data) {
bulk.find({ id: data.id }).updateOne({ $set: { email: data.email }
});
});
bulk.execute();
}
Executed in command line : mongo mydb update.js
Upvotes: 2