Reputation: 799
To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.
My current process Im doing with fast-csv is to bring in a chunk for processing (comes in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data coming in from the readstream.
The entire process looks like this:
var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
csvWrite = csv.createWriteStream({ headers: true });
writableStream = fs.createWriteStream(outputFileName);
csvStream = csv()
.on("data", function (data) {
csvStream.pause();
processRow(data, function () {
csvStream.resume();
});
})
.on("end", function () {
console.log('END OF CSV FILE');
});
csvWrite.pipe(writableStream);
source.pipe(csvStream);
}
csvPull(stream);
The problem I am currently running into is that Im noticing that for some reason, when my javascript compiles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it comes in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.
Are there any suggestions on how I might get around this without writing my own CSV parser into my code?
Upvotes: 7
Views: 5234
Reputation: 13692
Try reading your file with binary
for the encoding option. I had to read few csv with some accented characters and it worked fine with that.
var stream = fs.createReadStream(inputFileName, { encoding: 'binary' });
Upvotes: 7
Reputation: 11980
Unless I misunderstand, you should be able to fix this by setting the encoding on the stream to utf-8
(docs).
for the first line:
var stream = fs.createReadStream(inputFileName, {encoding: 'utf8'});
and if needed:
writableStream = fs.createWriteStream(outputFileName, {defaultEncoding: 'utf8'});
Upvotes: 0