Zac Tolley
Zac Tolley

Reputation: 2400

How can I parse a large delimited text file in node

I'm using Node to process log files from an application and due to the traffic volumes these can be a gigabyte or so in size each day.

The files are gripped every night and I need to read the files without having to unzip them to disk.

From what I understand I can use zlib to decompress the file to some form of stream but I don't know how to get at the data and not sure how i can then easily handle a line at a time (though I know some kind of while loop searching for \n will be involved.

The closest answer I found so far was demonstrating how to pipe the stream to a sax parser, but the whole node pipes/stream is a little confusing

fs.createReadStream('large.xml.gz').pipe(zlib.createUnzip()).pipe(saxStream);

Upvotes: 1

Views: 1684

Answers (1)

markuz-gj
markuz-gj

Reputation: 219

You should take a look at sax. It is developed by the isaacs!

I haven't tested this code, but I would start by writing something along these lines.

var Promise = Promise || require('es6-promise').Promise
, thr = require('through2')
, createReadStream = require('fs').createReadStream
, createUnzip = require('zlib').createUnzip
, createParser = require('sax').createStream
;

function processXml (filename) {
  return new Promise(function(resolve, reject){
    var unzip = createUnzip()
    , xmlParser = createParser()
    ;

    xmlParser.on('opentag', function(node){
      // do stuff with the node
    })
    xmlParser.on('attribute', function(node){
      // do more stuff with attr 
    })

    // instead of rejecting, you may handle the error instead.
    xmlParser.on('error', reject) 
    xmlParser.on('end', resolve)

    createReadStream(filename)
    .pipe(unzip)
    .pipe(xmlParser)
    .pipe(thr(function(chunk, enc, next){
      // as soon xmlParser is done with a node, it passes down stream.
      // change the chunk if you wish
      next(null, newerChunk)
    }))

    rl = readline.createInterface({
      input: unzip
    , ouput: xmlParser
    })
  })
}

processXml('large.xml.gz').then(function(){
  console.log('done')
})
.catch(function(err){
  // handle error.
})

I hope that helps

Upvotes: 1

Related Questions