Reputation: 641
I have the following piece of code where I iterate through a collection and do another db query and construct an object within its callback. Finally I save that object to another collection.
I wish to call another function after all items have been saved, but can't figure out how. I tried using the async library, specifically async whilst when item is not null, but that just throws me in an infinite loop.
Is there a way to identify when all items have been saved?
Thanks!
var cursor = db.collection('user_apps').find({}, {timeout:false});
cursor.each(function (err, item) {
if (err) {
throw err;
}
if (item) {
var appList = item.appList;
var uuid= item.uuid;
db.collection('app_categories').find({schema_name:{$in: appList}}).toArray(function (err, result) {
if (err) throw err;
var catCount = _.countBy(result, function (obj) {
return obj.category;
})
catObj['_id'] = uuid;
catObj['total_app_num'] = result.length;
catObj['app_breakdown'] = catCount;
db.collection('audiences').insert(catObj, function (err) {
if (err) console.log(err);
});
});
}
else {
// do something here after all items have been saved
}
});
Upvotes: 1
Views: 304
Reputation: 50406
The key here is to use something that is going to respect the callback signal when performing the "loop" operation. The .each()
as implemented here will not do that, so you need an "async" loop control that will signify that each loop has iterated and completed, with it's own callback within the callback.
Provided your underlying MongoDB driver is at least version 2, then there is a .forEach()
which has a callback which is called when the loop is complete. This is better than .each()
, but it does not solve the problem of knowing when the inner "async" .insert()
operations have been completed.
So a better approach is to use the stream interface returned by .find()
, where this is more flow control allowed. There is a .stream()
method for backwards compatibility, but modern drivers will just return the interface by default:
var stream = db.collection('user_apps').find({});
stream.on("err",function(err){
throw(err);
});
stream.on("data",function(item) {
stream.pause(); // pause processing of stream
var appList = item.appList;
var uuid= item.uuid;
db.collection('app_categories').find({schema_name:{ "$in": appList}}).toArray(function (err, result) {
if (err) throw err;
var catCount = _.countBy(result, function (obj) {
return obj.category;
})
var catObj = {}; // always re-init
catObj['_id'] = uuid;
catObj['total_app_num'] = result.length;
catObj['app_breakdown'] = catCount;
db.collection('audiences').insert(catObj, function (err) {
if (err) console.log(err);
stream.resume(); // resume stream processing
});
});
});
stream.on("end",function(){
// stream complete and processing done
});
The .pause()
method on the stream stops further events being emitted so that each object result is processed one at a time. When the callback from the .insert()
is called, then the .resume()
method is called, signifying that processing is complete for that item and a new call can be made to process the next item.
When the stream is complete, then everything is done so the "end" event hook is called to continue your code.
That way, both each loop is signified with an end to move to the next iteration as well as there being a defined "end" event for the complete end of processing. As the control is "inside" the .insert()
callback, then those operations are respected for completion as well.
As a side note, you might consider including your "category" information in the source collection, as it seems likely your results can be more efficiently returned using .aggregate()
if all required data were in a single collection.
Upvotes: 2