egilchri
egilchri

Reputation: 761

nodejs read multiple csv files counting lines, and produce and overall tally at the end

I have the following Nodejs code. My intention is to produce a single count of of all lines in all the files. However, when I run this code, I just receive a count of the smallest file.

I think I understand why. All the 6 files reads get launched in quick succession and naturally, the shortest files finishes first, and doesn't wait for all the other tallies to complete.

My question is: What's the best Nodejs approach to this problem? In real life, I want to do a more complex operation than incrementing the counter each time, but this gets across the idea.

Should I use promises somehow to do this, or perhaps key off of some other kind of event?

var fs = require("fs");
var readline = require('readline');

var TOTAL_LINES = 0;

allCSVFiles = ["a", "b", "c", "d", "e", "f"];

allCSVFiles.forEach(function(file, idx, array){             
    var pathToFile = `/scratch/testDir/${file}.csv`;

    var rd = readline.createInterface({
    input: fs.createReadStream(pathToFile),
    // output: process.stdout,
    console: false
    });

    rd.on('line', function(line) {
    TOTAL_LINES++;
    })
    .on('close', function() {
        console.log (`closing: ${pathToFile}`);
        if (idx === array.length - 1){ 
        console.log (`Grand total: ${TOTAL_LINES}`);
        }
    })
});

Upvotes: 0

Views: 1000

Answers (2)

egilchri
egilchri

Reputation: 761

Ok, I think I have an answer to my own question. Please feel free to critique it.

var fs = require("fs");
var readline = require('readline');

var TOTAL_LINES = 0;

var allMyPromises = [];

allCSVFiles = ["a", "b", "c", "d", "e", "f"];

allCSVFiles.forEach(function(file, idx, array){             
    var myPromise = readOneFile (file, idx, array);
    allMyPromises.push (myPromise);
});

Promise.all(allMyPromises).then(function(values) {
    console.log (`Grand total: ${TOTAL_LINES}`);
});

function readOneFile(file,idx, array){

return new Promise(function(resolve, reject) {
    var pathToFile = `/scratch/testDir/${file}.csv`;

    var rd = readline.createInterface({
    input: fs.createReadStream(pathToFile),
    // output: process.stdout,
    console: false
    });

    rd.on('line', function(line) {
    TOTAL_LINES++;
    })
    .on('close', function() {
        console.log (`closing: ${pathToFile}`);
        resolve (TOTAL_LINES);
    })
}
          )
}

Upvotes: 1

Senthil
Senthil

Reputation: 2246

Yes, you can use promise to do async reading of files. Due to the async nature of Node.js, simply using fs.readFile would result in all files processed asynchronously.

Ref: http://www.yaoyuyang.com/2017/01/20/nodejs-batch-file-processing.html

This example shows create a totalsummary empty file, then how to keep on appending to a file for each promise completion. In your case using promise before appending to the target summary file, read the existing file content to capture the previous line count, then do a sum and update the file based on the aggregated total.

Recommendation: If you have a long running computation, you should start another process (using child_process creation) for Parallel processing and then just have your node.js process asynchronously wait for results.

Ref: Parallelizing tasks in Node.js

Best way to execute parallel processing in Node.js

So please explain your use case.

Upvotes: 1

Related Questions