Reputation: 464
Old question title:
"node.js readline form net.Socket (process.stdin) cause error: heap out of memory (conversion of net.Socket Duplex to Readable stream)"
... I've changed it because nobody answered and it seems like important question in node.js ecosystem.
Question is how to solve problem of "heap out of memory" error when reading line by line from huge stdin? Error is not happening when you dump stdout to file (eg: test.log) and read to 'readline' interface through fs.createReadStream('test.log').
Looks like process.stdin is not Readable stream as it is mentioned here: https://nodejs.org/api/process.html#process_process_stdin
To reproduce the issue I've created two scripts. First is to just generate huge amount of data (a.js file):
// a.js
// loop in this form generates about 7.5G of data
// you can check yourself running:
// node a.js > test.log && ls -lah test.log
// will return
// -rw-r--r-- 1 sd staff 7.5G 31 Jan 22:29 test.log
for (let i = 0 ; i < 8000000 ; i += 1 ) {
console.log(`${i} ${".".repeat(1000)}\n`);
}
The script to consume this through bash pipe with readline (b.js file):
const fs = require('fs');
const readline = require('readline');
const rl = readline.createInterface({
input: process.stdin, // doesn't work
//input: fs.createReadStream('test.log'), // works
});
let s;
rl.on('line', line => {
// deliberaty commented out to demonstrate that issue
// has nothing to do beyond readline and process.stdin
// s = line.substring(0, 7);
//
// if (s === '100 ...' || s === '400 ...' || s === '7500000') {
//
// process.stdout.write(`${line}\n`);
// }
});
rl.on('error', e => {
console.log('general error', e)
})
Now when you run;
node a.js | node b.js
it will result with error:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
but if you swap
const rl = readline.createInterface({
input: process.stdin,
});
to
const rl = readline.createInterface({
input: fs.createReadStream('test.log')
});
and run
node a.js > test.log
node b.js
everything works fine
Problem comes down actually to how to convert net.Socket to fully functional Readable stream?, if it is possible at all.
Basically my problem is that it seems like it is not possible to handle huge amount of data from stdin as a stream which is natural for Unix style pipes. So despite the fact node.js is brilliant in handling streams you can't write program that would handle huge amount of data through unix style pipes.
It would be totally not necessary in some cases to dump data to the hard drive and only after handle it with fs.createReadStream('test.log') only because of this limitation.
I thought that streams are all about handling huge amount of data (among other use cases) on the flight without saving it on hard drive.
Upvotes: 1
Views: 1418
Reputation: 416
The problem is not input data size, not Node, but a faulty design of your data generator: it does not implement pausing/resuming data generation on request of consumer output stream. Instead of just pushing data to console.log(..)
you should correctly interact with standard output stream, and correctly handle pause
and resume
signals from that stream.
The file input stream created by fs.createReadStream()
is implemented properly, and it does pause/resumes as necessary, thus does not crash the code.
Upvotes: 1
Reputation: 29436
You can always treat process.stdin
as a normal NodeJS stream and handle the reading your self:
const os = require('os');
function onReadLine(line) {
// do stuff with line
console.info(line);
}
// read input and split into lines
let BUFF = '';
process.stdin.on('data', (buff) => {
const content = buff.toString('utf-8');
for (let i = 0; i < content.length; i++){
if (content[i] === os.EOL) {
onReadLine(BUFF);
BUFF = '';
} else {
BUFF += content[i];
}
}
});
// flush last line
process.stdin.on('end', () => {
if (BUFF.length > 0) {
onReadLine(BUFF);
}
});
Example:
// unix
cat ./somefile.txt | node ./script.js
// windows
Start-Process -FilePath "node" -ArgumentList @(".\script.js") -RedirectStandardInput .\somefile.txt -NoNewWindow -Wait
Upvotes: 1