Reputation: 5213
I have a large glob of file paths. I'm getting this path list from a streaming glob module https://github.com/wearefractal/glob-stream
I was piping this stream to another stream that was creating fileReadStreams
for each path and quickly hitting some limits. I was getting the:
warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit
and also Error: EMFILE, open
I've tried bumping the maxListeners
but I have ~9000 files that would be creating streams and I'm concerned that will eat memory that number is not constant and will grow. Am I safe to remove the limit here?
Should I be doing this synchronously? or should I be iterating over the paths and reading the files sequentially? Won't that still execute all the reads at once using a for loop?
Upvotes: 2
Views: 800
Reputation: 144912
The max listeners thing is purely a warning. setMaxListeners
only controls when that message is printed to the console, nothing else. You can disable it or just ignore it.
The EMFILE
is your OS enforcing a limit on the number of open files (file descriptors) your process can have at a single time. You could avoid this by increasing the limit with ulimit.
Because saturating the disk by running many thousands of concurrent filesystem operations won't get you any added performance—in fact, it will hurt, especially on traditional non-SSD drives—it is a good idea to only run a controlled number of operations at once.
I'd probably use an async queue, which allows you to push the name of every file to the queue in one loop, and then only runs n operations at once. When an operation finishes, the next one in the queue starts.
For example:
var q = async.queue(function (file, cb) {
var stream = fs.createReadStream(file.path);
// ...
stream.on('end', function() {
// finish up, then
cb();
});
}, 2);
globStream.on('data', function(file) {
q.push(file);
});
globStream.on('end', function() {
// We don't want to add the `drain` handler until *after* the globstream
// finishes. Otherwise, we could end up in a situation where the globber
// is still running but all pending file read operations have finished.
q.drain = function() {
// All done with everything.
};
// ...and if the queue is empty when the globber finishes, make sure the done
// callback gets called.
if (q.idle()) q.drain();
});
You may have to experiment a little to find the right concurrency number for your application.
Upvotes: 2