Reputation: 115
I have a large array of filenames I need to check, but I also need to respond to network clients. The easiest way is to perform:
for(var i=0;i < array.length;i++) { fs.readFile(array[i], function(err, data) {...}); }
, but array can be of any length, say 100000, so it's not a good idea to perform 100000 reads at once, on the other hand doing fs.readFileSync() can take too long. Also launching next fs.readFile() in callback, like this:
var Idx = 0; function checkFile() { fs.readFile(array[Idx], function (err, data) { Idx++; if (Idx < array.length) { checkFile(); } else { Idx = 0; setTimeout(checkFile, 10000); // start checking files in one second } }); }
is also not a best option, because array[] gets constantly updated by network clients - some items deleted, new added and so on.
What is the best way to accomplish such a task in node.js?
Upvotes: 1
Views: 534
Reputation: 14881
You should stick to your first solution (fs.readFile
). For file I/O, node.js uses a thread pool. The reason is that most unix kernels don't provide efficient asynchronous APIs for the file system. Even if you start 10,000 reads concurrently, only a few reads will actually run and the rest will wait in a queue.
In order to make this answer more interesting, I browsed through node's code again to make sure that things hadn't changed.
Long story short, file I/O uses blocking system calls and is made by a thread pool with at most 4 concurrent threads.
The important code is in libeio, which is abstracted by libuv. All I/O code is wrapped by macros which queue requests. For example:
eio_req *eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data, eio_channel *channel)
{
REQ (EIO_READ); req->int1 = fd; req->offs = offset; req->size = length; req->ptr2 = buf; SEND;
}
REQ
prepares the request and SEND
queues it. We eventually end up in etp_maybe_start_thread
:
static unsigned int started, idle, wanted = 4;
(...)
static void
etp_maybe_start_thread (void)
{
if (ecb_expect_true (etp_nthreads () >= wanted))
return;
(...)
The queue keeps 4 threads running to process the requests. When our read request is finally executed, eio simply use the block read
from unistd.h:
case EIO_READ: ALLOC (req->size);
req->result = req->offs >= 0
? pread (req->int1, req->ptr2, req->size, req->offs)
: read (req->int1, req->ptr2, req->size); break;
Upvotes: 3