Justin Elkow
Justin Elkow

Reputation: 2943

Node.js readStream for end of large files

I want to occasionally send the last 2kB of my large log file (>100MB) in an email notification. Right now, I am trying the following:

var endLogBytes = fs.statSync(logFilePath).size;
var endOfLogfile = fs.createReadStream(logFilePath, {start: endLogBytes-2000, end: endLogBytes - 1, autoClose: true, encoding: 'utf8'});
endOfLogfile.on('data', function(chunk) {
    sendEmailFunction(chunk);
}

Since I just rebooted, my log files are only ~2MB, but as they get larger I am wondering:

1) Does it take a long time to read out the data (Does Node go through the entire file until it gets to the Bytes I want OR does Node jump to the Bytes that I want?)

2) How much memory is consumed?

3) When is the memory space freed up? How do I free the memory space?

Upvotes: 1

Views: 1389

Answers (1)

mynameisdaniil
mynameisdaniil

Reputation: 1156

You should not use ReadStream in that case; cause it is a stream it have to(I suppose) grind up all the prepending data before it gets to the last two kilobytes. So I would do just fs.open and then fs.read with the descriptor of opened file. Like that:

fs.open(logFilePath, 'r', function(e, fd) {
  if (e)
    throw e; //or do whatever you usually doing in such kind of situations
  var endOfLogfile = new Buffer(2048);
  fs.read(fd, endOfLogFile, endLogBytes-2048, 2048, null, function(e, bytesRead, data) {
    if (e)
      throw e;
    //don't forget to data.toString('ascii|utf8|you_name_it')
    sendEmailFunction(data.toString('ascii'));
  });
});

UPDATE: Seems like current implementation of ReadStream smart enough to read only required amount of data. See: https://github.com/joyent/node/blob/v0.10.29/lib/fs.js#L1550. It uses fs.open and fs.read under the hood. So you can use ReadStream without worry. Anyway I would go with fs open/read, cause it is more explicit, C-way, better style and so on.

About memory and freeing it up. You will need at least 2Mb of memory for data buffer + some overhead. I don't think there is some way to tell how much of overhead it will take exactly. Just test it with your target OS and node version. You can use this module for profiling: https://www.npmjs.org/package/webkit-devtools-agent.

Memory will be freed up when you will not use buffer with data and GC will decide that this is good time to collect some garbage. GC is non deterministic(i.e. unpredictable). You should not try to predict it behaviour or force it in any way to do garbage collection.

Upvotes: 2

Related Questions