Node.js: Processing a stream without running out of memory

Question

I'm trying to read a giant logfile (250,000 lines), parsing each line into a JSON object, and insert each JSON object to CouchDB for analytics.

I'm trying to do this by creating a buffered stream that will process each chunk seperately, but I always run out of memory after about 300 lines. It seems like using buffered streams and util.pump should avoid this, but apparently not.

(Perhaps there are better tools for this than node.js and CouchDB, but I'm interested in learning how to do this kind of file processing in node.js and think it should be possible.)

CoffeeScript below, JavaScript here: https://gist.github.com/5a89d3590f0a9ca62a23

fs = require 'fs'
util = require('util')
BufferStream = require('bufferstream')

files = [
  "logfile1",
]

files.forEach (file)->
  stream = new BufferStream({encoding:'utf8', size:'flexible'})
  stream.split("
")
  stream.on("split", (chunk, token)->
    line = chunk.toString()
    # parse line into JSON and insert in database
  )
  util.pump(fs.createReadStream(file, {encoding: 'utf8'}), stream)

Node.js: Processing a stream without running out of memory

Answers (1)

Related Questions