Reputation: 2400
I'm using Node to process log files from an application and due to the traffic volumes these can be a gigabyte or so in size each day.
The files are gripped every night and I need to read the files without having to unzip them to disk.
From what I understand I can use zlib to decompress the file to some form of stream but I don't know how to get at the data and not sure how i can then easily handle a line at a time (though I know some kind of while loop searching for \n will be involved.
The closest answer I found so far was demonstrating how to pipe the stream to a sax parser, but the whole node pipes/stream is a little confusing
fs.createReadStream('large.xml.gz').pipe(zlib.createUnzip()).pipe(saxStream);
Upvotes: 1
Views: 1684
Reputation: 219
You should take a look at sax
.
It is developed by the isaacs!
I haven't tested this code, but I would start by writing something along these lines.
var Promise = Promise || require('es6-promise').Promise
, thr = require('through2')
, createReadStream = require('fs').createReadStream
, createUnzip = require('zlib').createUnzip
, createParser = require('sax').createStream
;
function processXml (filename) {
return new Promise(function(resolve, reject){
var unzip = createUnzip()
, xmlParser = createParser()
;
xmlParser.on('opentag', function(node){
// do stuff with the node
})
xmlParser.on('attribute', function(node){
// do more stuff with attr
})
// instead of rejecting, you may handle the error instead.
xmlParser.on('error', reject)
xmlParser.on('end', resolve)
createReadStream(filename)
.pipe(unzip)
.pipe(xmlParser)
.pipe(thr(function(chunk, enc, next){
// as soon xmlParser is done with a node, it passes down stream.
// change the chunk if you wish
next(null, newerChunk)
}))
rl = readline.createInterface({
input: unzip
, ouput: xmlParser
})
})
}
processXml('large.xml.gz').then(function(){
console.log('done')
})
.catch(function(err){
// handle error.
})
I hope that helps
Upvotes: 1