A K
A K

Reputation: 1674

Node File Line Count

I got feedback that this node function has performance issues in obtaining the line count of a file but haven't been able to pinpoint the exact details.

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
  let lineCount = 0;
  fs.createReadStream(filePath)
    .on("data", (buffer) => {
      buffer.forEach((chunk) => {
        if (chunk === 10) lineCount++;
      });
    }).on("end", () => {
      resolve(lineCount);
    }).on("error", reject);
  });
};

Is there a more performant way in obtaining the line count of a file in Node?

Upvotes: 1

Views: 2363

Answers (2)

Emil Vikström
Emil Vikström

Reputation: 91983

I can only speculate but buffer.forEach calling a function and doing a comparison for every byte could be the problem. Consider using indexOf to let the VM find the newlines for you:

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
    let lineCount = 0;
    fs.createReadStream(filePath)
      .on("data", (buffer) => {
        let idx = -1;
        lineCount--; // Because the loop will run once for idx=-1
        do {
          idx = buffer.indexOf(10, idx+1);
          lineCount++;
        } while (idx !== -1);
      }).on("end", () => {
        resolve(lineCount);
      }).on("error", reject);
    });
};

What this solution does is that it finds the position of the first newline using .indexOf. It increments lineCount, then it finds the next position. The second parameter to .indexOf tells where to start looking for newlines. This way we are jumping over large chunks of the buffer. The while loop will run once for every newline, plus one.

We are letting the Node runtime do the searching for us which is implemented on a lower level and should be faster.

On my system this is about twice as fast as running a for loop over the buffer length on a large file (111 MB).

Upvotes: 1

Qiaosen Huang
Qiaosen Huang

Reputation: 1133

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
  let lineCount = 0;
  let i = 0;
  fs.createReadStream(filePath)
    .on("data", (buffer) => {
      for (i = 0; i < buffer.length; ++i) {
        if (buffer[i] == 10) lineCount++;
      }
    }).on("end", () => {
      resolve(lineCount);
    }).on("error", reject);
  });
};

for comparison:

original: node index.js 2.38s user 0.29s system 98% cpu 2.713 total

modifed: node index2.js 0.18s user 0.04s system 96% cpu 0.225 total

Upvotes: 3

Related Questions