Steven Lu
Steven Lu

Reputation: 43427

What is a good pattern for correctly appending to a file with async await?

Is it safe to call functions like fs.appendFile() repeatedly? A contrived example you could consider is when using child_process.spawn and a "for-async-of" loop to implement tee with js. We get file data in chunks, and would like to append them to a file and perform other processing. If the appendFile() is promisified and awaited, then the execution shall be paused until we know the file has been written, which might be wasted time. However, if I do not await this, I feel no longer certain that my potentially quickly repeated calls to appendFile could result in an interlaced output file.

Does the node fs library perform the file locking for me? What might be an effective way to do so? Should I write a helper class to offload the work into in its own asynchronous appending loop?

Here is my testing so far, which is encouraging though will seemingly never be conclusive:

#!/usr/bin/env node                                                             
const spawn = require('child_process').spawn;                                   
const fs = require('fs');                                                       
const util = require('util');                                                   
const start = new Date();                                                       
const proc = spawn('seq 1 200000', { shell:true });                             
proc.stdout.setEncoding('utf8');                                                
(async()=>{                                                                     
    for await (const data of proc.stdout) {                                     
        console.log(new Date() - start, `got ${data.length} chars of stdout`);  
        util.promisify(fs.appendFile)('junk', data);                            
    };                                                                          
})();                                                                           

Output:

5 'got 8192 chars of stdout'
7 'got 65536 chars of stdout'
7 'got 65536 chars of stdout'
7 'got 53248 chars of stdout'
8 'got 65536 chars of stdout'
8 'got 65536 chars of stdout'
8 'got 49152 chars of stdout'
9 'got 65536 chars of stdout'
9 'got 65536 chars of stdout'
9 'got 65536 chars of stdout'
9 'got 24576 chars of stdout'
9 'got 65536 chars of stdout'
9 'got 40960 chars of stdout'
10 'got 49152 chars of stdout'
10 'got 32768 chars of stdout'
10 'got 32768 chars of stdout'
10 'got 32768 chars of stdout'
10 'got 32768 chars of stdout'
10 'got 32768 chars of stdout'
11 'got 32768 chars of stdout'
11 'got 32768 chars of stdout'
11 'got 32768 chars of stdout'
11 'got 24576 chars of stdout'
11 'got 24576 chars of stdout'
11 'got 32768 chars of stdout'
12 'got 65536 chars of stdout'
12 'got 65536 chars of stdout'
13 'got 64191 chars of stdout'

diff junk <(sort -n junk) reveals that the output is correct (nothing got interlaced).

When I added await in front of the append call, the time taken to perform the write increased by a factor of about 30%.

I guess based on the results I will just await the appendFile call even though it looks safe to skip that. The performance hit is tiny.

Upvotes: 2

Views: 1904

Answers (2)

jfriend00
jfriend00

Reputation: 707228

Your code is theoretically subject to race conditions because the various writes in different fs.appendFile() operations could clash and could be in flight at the same time because your for loop runs to completion before all the fs.appendFile() operations are done. Therefore you will have multiple fs.appendFile() operations in flight at the same time. This is not a safe way to write code. If you want the writes serialized in a specific order and so that they cannot collide, you should guarantee that this happens by the way you write your code, not by the luck of how the timing all works out or how the internals of the library code are written.

I have proven in a test app that fs.appendFile() does not acquire exclusive access and does not lock the file and mulitple fs.appendFile() writes from the same program can overwrite one another or they can interleave in an unpredictable order. This means that you are taking a risk by having multiple fs.appendFile() operations in flight at the same time. My guess is that you are getting lucky because of the speed of your disk writes and the ability to handle large chunks at once. If the underlying system had to break the writes up into multiple chunks you'd have more of a chance of conflict.

In addition, this luck probably also varies by OS, by file system type and by file system load and performance.

In fact, I've shown that even a simple loop like this where the data for each write is simply 40 rows of this:

'0123456789abcdefghijhklmopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\n'

And, this code:

const numIterations = 100;
for (let i = 0; i < numIterations; i++) {
    fs.appendFile("output.txt", `(${i}):` + testData, function(err) {
        if (err) {
            console.log(i, err);
            process.exit(1);
        }
    });
}

Won't write all the data in order. I get the first four iterations of the loop written in this order:

(0), (2), (3), (1)

then, later there's

(10), (11), (9)

out of order.


If you use for await (...) and make the for loop wait just a tiny amount between iterations, the writes again go in order (no surprise). But, I think this just goes to show that it's inherently a race condition, just sometimes you can create conditions where the race condition doesn't usually bite you. That still isn't the right way to code it.


In addition, the promisify stuff in this code isn't doing anything useful:

#!/usr/bin/env node                                                             
const spawn = require('child_process').spawn;                                   
const fs = require('fs');                                                       
const util = require('util');                                                   
const start = new Date();                                                       
const proc = spawn('seq 1 200000', { shell:true });                             
proc.stdout.setEncoding('utf8');                                                
(async()=>{                                                                     
    for await (const data of proc.stdout) {                                     
        console.log(new Date() - start, `got ${data.length} chars of stdout`);  
        util.promisify(fs.appendFile)('junk', data);                            
    };                                                                          
})();   

This code may not be doing what you thought it was. The util.promisify(fs.appendFile) does nothing useful there because you aren't waiting for the promise that it returns when you call it. Therefore, it's no different at all from just calling fs.appendfile() except if you got an error, you would get a complaint about an unhandled rejection.

If you do await that promise, then it will make the code safe as the await will force the for loop to wait for one operation to finish before the next iteration starts.


Now, on to some of your questions?

Does the node fs library perform the file locking for me?

No, it does not. On Windows 10 (where I tested), I could cause data conflict when some other code in the same app was trying to write to the same file where data got overwritten. I could also even get the same loop to be written out of order.

What might be an effective way to do so?

My recommendation for a problem like this would be pipe the input to an output stream. Open the stream once and pipe the input to it. Skip the loop entirely. Skip calling fs.appendFile() over and over which has to open the file, seek to the end, write data, then close the file, everytime through the loop. Very inefficient.

Should I write a helper class to offload the work into in its own asynchronous appending loop?

No, just use a stream. A stream is built exactly for this.

#!/usr/bin/env node                                                             
const spawn = require('child_process').spawn;                                   
const fs = require('fs');                                                       
const util = require('util');                                                   
const start = new Date();                                                       
const proc = spawn('seq 1 200000', { shell:true });                             
proc.stdout.setEncoding('utf8');                                                
let outputStream = fs.createWriteStream('junk');
proc.stdout.pipe(outputStream);

What is a good pattern for correctly appending to a file with async await?

If you really want to use async/await here, then you can use await in front of the file operation. That will make it safe because your force the loop to wait for the completion of one write before starting the next write.

#!/usr/bin/env node                                                             
const spawn = require('child_process').spawn;                                   
const fs = require('fs');                                                       
const util = require('util');                                                   
const start = new Date();                                                       
const proc = spawn('seq 1 200000', { shell:true });                             
proc.stdout.setEncoding('utf8');                                                
(async()=>{                                                                     
    for await (const data of proc.stdout) {                                     
        console.log(new Date() - start, `got ${data.length} chars of stdout`);  
        await fs.promises.appendFile('junk', data);                            
    };                                                                          
})();   

Other Notes:

You're using for await () so you already have a recent version of node.js. If you did want promisified file functions, you can just use the fs.promises interface that is built-in instead of manually promisifying fs functions.

Upvotes: 6

zfrisch
zfrisch

Reputation: 8660

Not sure what you mean by safe, but there's nothing wrong with it as long as you don't need the file appended in a specified order. Judging by your question including fs.appendFile that probably is what you want.

Because you're dealing with Promises it's likely better to write a class or a Constructor so that you can serialize your Promises and do something when finished.

This is one that I had written a while back that may help you:


Serial.js

function Serial(promises = []) {
    return {
        promises,
        resolved: [],
        addPromise: function(fn) {
            promises.push(fn);
        },
        resolve: async function(cb = i => i, err = (e) => console.log("trace: Serial.resolve " + e)) {
            try {
                for await (let p of this[Symbol.iterator]()) {}
                return this.resolved.map(cb);
            } catch (e) {
                err(e);
            }
        },
        [Symbol.iterator]: async function*() {
            this.resolved = [];
            for (let promise of this.promises) {
                let p = await promise().catch(e => console.log("trace: Serial[Symbol.iterator] ::" + e));
                this.resolved.push(p);
                yield p;
            }
        }
    }
}
module.exports = Serial;

Index.js

const Serial = require("./js/Serial");
let application_complete = false;

    new Serial([
                 appendPromise1,
                 appendPromise2, 
                 appendPromise3
                        ])
                        .resolve()
                        .then(() => { 
                        application_complete = true;
                    });


process.on("beforeExit", async () => {
    await application_complete;
    console.log("File Appended!");
})

Note The beforeExit event is just to prevent the node environment from closing before the asynchronous function is returned. The Serial Constructor doesn't require it or anything.


Upvotes: 0

Related Questions