Reputation: 133
I'm trying to read a 20 million lines file and correct the line endings from windows to mac. I know it can be done in sed but sed gives me an error that I don't know how to fix (dos2unix: Binary symbol 0x0008 found at line 625060). So I'm trying to fix this in NodeJS. Here's my code:
var fs = require('fs');
var eol = require('eol');
//read file
var input = fs.readFileSync(process.argv[2], 'utf8');
//fix lines
output = eol.auto(input);
console.log("Lines Fixed! Now Writing....")
//write file
fs.writeFile(process.argv[2] + '_fixed.txt', output, function (err) {
if (err) return console.log(err);
});
console.log("Done!")
Problem is the file is too big and I get this error buffer.js:513 throw new Error('"toString()" failed');
Upvotes: 5
Views: 19809
Reputation: 3154
You shouldn't do it synchronously. The best way to deal with big data is streams:
let output = '';
const readStream = fs.createReadStream(filename);
readStream.on('data', function(chunk) {
output += eol.auto(chunk.toString('utf8'));
});
readStream.on('end', function() {
console.log('finished reading');
// write to file here.
});
Upvotes: 8
Reputation: 28529
For reading very big files, you'd better not read the whole file into memory, you can read the file by lines or by chunks. On how to read big file by lines or by chunks with nodejs refer to my answer here of this node.js: read a text file into an array. (Each line an item in the array.).
Upvotes: 0