Reputation: 679
I have over 10 million records in my mongo collection that I want to move to some other database.
There are two methods on how I can achieve that :
Batching data with find
const batchSize = 1000;
const collection = mongo.client.collection('test');
const count = await quizVersionCollection.count();
let iter = 0;
while (iter * batchSize <= count) {
const dataArr = await collection.find({})
.sort({ _id: -1 })
.limit(batchSize)
.skip(iter * batchSize)
.toArray();
iter += 1;
}
Using mongo cursor
while (yield cursor.hasNext()) {
const ids = [];
const batchSize = 1000;
for (let i = 0; i < batchSize; i += 1) {
if (yield cursor.hasNext()) {
ids.push((yield cursor.next())._id);
}
}
done += batchSize;
}
In the first method, I am making a single request for every 1000 documents whereas in the second one I am making 2 requests for every single document. Which is the better method in terms of speed and computation?
Upvotes: 1
Views: 3249
Reputation: 671
The first method is better because as you said: you are making just 1 call per 1000 documents. Thus you are saving all the network traffic that will be generated if you get documents one by one. The second method would take a lot of network time since it is fetching documents one by one.
Some tips:
It is never a good idea to use skip in mongo queries because according to mongodb documentation:
The cursor.skip() method requires the server to scan from the beginning of the input results set before beginning to return results. As the offset increases, cursor.skip() will become slower.
Set batch size to something just less than 16MB/(the average size of your document). This is because mongoDB has a 16MB limit on the response size. This way you can minimize the number of calls you make.
ids
at the interval boundary and use those ids to create range conditions. Then you can remove sort
, limit
and skip
. This will make a huge impact on performance.Upvotes: 6