Priyath Gregory
Priyath Gregory

Reputation: 987

Possible data inconsistencies when writing to a file in a loop in Node.js

I have an array with say.. 100000 objects. I use the map function and on each iteration, I build a string and write the content to a CSV like so:

  entriesArray.map((entry) => {
    let str = entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
    entry.address + ',' + entry.age + ',' + entry.sex + '\n'
    writeToFile(str);
  });

The writeToFile function:

const writeToFile = (str) => {
  fs.appendFile(outputFileName + '.csv', str, (err) => {
    if (err) throw err;
  });
};

This works as expected, but i'm concerned if having so many asynchronous write operations could lead to any data inconsistencies. So my question is, is this safe? Or is there a better way to do it.

Btw, The same code on a MAC OS threw the error Error: ENFILE: file table overflow, open 'output.csv'. On a bit of research, I learned that this is due to OSX having a very low open file limit. More details on this can be found here.

Again I'm hoping an improvement to my file write mechanism could sort this issue out as well.

Upvotes: 1

Views: 482

Answers (1)

jfriend00
jfriend00

Reputation: 707158

You are correct to realize that this is not a good way to code as there are no guarantees of order with asynchronous writing (particularly if the writes are large and may take more than one actual write operation to disk). And, remember that fs.appendfile() actually consists of three asynchronous operations fs.open(), fs.write() and fs.close(). And, as you have seen, this opens a lot of file handles all at once as it tries to do every single write in parallel. None of that is necessary.

I'd suggest you build the text you want to write as a string and do one write at the end as there appears to be no reason to actually write each one separately. This will also be a lot more efficient:

writeToFile(entriesArray.map((entry) => {
    return entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
        entry.address + ',' + entry.age + ',' + entry.sex + '\n';
}).join(""));

Let's say you had 1000 items in your entriesArray. Your scheme was doing 3000 disk operations open, write and close for every single entry. My suggested code does 3 disk operations. This should be significantly faster and have a guaranteed write order.


Also, you really need to think about proper error handling. Using something like:

if (err) throw err;

inside an async callback is NOT proper error handling. That throws into an async event which you have no ability to ever handle. Here's on scheme:

const writeToFile = (str, fn) => {
  fs.appendFile(outputFileName + '.csv', str, (err) => {
    fn(err);
  });
};

writeToFile(entriesArray.map((entry) => {
    return entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
        entry.address + ',' + entry.age + ',' + entry.sex + '\n';
}).join(""), function(err) {
    if (err) {
       // error here
    } else {
       // success here
    }
});

Upvotes: 3

Related Questions