Reputation: 3691
I'm trying to read a giant logfile (250,000 lines), parsing each line into a JSON object, and insert each JSON object to CouchDB for analytics.
I'm trying to do this by creating a buffered stream that will process each chunk seperately, but I always run out of memory after about 300 lines. It seems like using buffered streams and util.pump should avoid this, but apparently not.
(Perhaps there are better tools for this than node.js and CouchDB, but I'm interested in learning how to do this kind of file processing in node.js and think it should be possible.)
CoffeeScript below, JavaScript here: https://gist.github.com/5a89d3590f0a9ca62a23
fs = require 'fs'
util = require('util')
BufferStream = require('bufferstream')
files = [
"logfile1",
]
files.forEach (file)->
stream = new BufferStream({encoding:'utf8', size:'flexible'})
stream.split("\n")
stream.on("split", (chunk, token)->
line = chunk.toString()
# parse line into JSON and insert in database
)
util.pump(fs.createReadStream(file, {encoding: 'utf8'}), stream)
Upvotes: 0
Views: 3370
Reputation: 368
Maybe this helps: Memory leak when using streams in Node.js?
Try to use pipe()
to solve it.
Upvotes: 2