Reputation: 137
I am trying to convert a very large json to csv in node.js but it is taking too much time and also leading to 100% cpu at the time of conversion.
jsonToCsv: function (data) {
var keys = Object.keys(data[0]);
var csv = [keys.join(",")];
console.time("CSVGeneration");
data.forEach(function (row) {
var line = '';
keys.forEach(function (key) {
if (typeof row[key] === 'string') {
row[key] = "" + file_utils.escapeCsv(row[key]) + "";
}
line += row[key] + ",";
});
csv.push(line);
});
console.timeEnd("CSVGeneration");
csv = csv.join("\n");
return csv;
},
escapeCsv: function (x) {
if (x)
return ('' + x.replace(/"/g, '').replace(/,/g, ' ').replace(/\n/g, " ").replace(/\r/g, " ") + '');
else
return ('');
},
On an average run for 1Lac rows, it never recovered to even log time. I had to kill the process manually.
Can someone suggest a better alternative to this?
Upvotes: 2
Views: 2978
Reputation: 74046
Before answering this: Assuming your code is working, this question belongs to https://codereview.stackexchange.com/ .
As for your problem:
forEach()
, while being rather comfortable when coding, are usually not quite performant. A simple for
loop is the better choice in performance critical situations.escapeCsv()
you apply 4 different regex replacements each for just one character. Combine those into one.data
is an Array of objects, each having the same properties), it is not necessary to retrieve the keys individually for each object.Applying this, yields the following code:
function escapeCsv(x) {
if (x) {
return ('' + x).replace( /[",\n\r]/gi, '' );
} else {
return ('');
}
}
function jsonToCsv(data) {
var keys = Object.keys(data[0]),
csv = [keys.join(",")];
var row = new Array( keys.length );
for (var i = 0; i < data.length; i++) {
for (var j = 0; j < keys.length; j++) {
if (typeof data[i][keys[j]] === 'string') {
row[j] = '"' + escapeCsv(data[i][keys[j]]) + '"';
} else {
row[j] = data[i][keys[j]] || '';
}
}
csv.push(row.join(','));
}
return csv.join("\n");
}
This alone yields a performance improvement for about 3-5 according to jsPerf.
If the CSV, you are generating can be streamed to a file or to a client directly, one could improve even more and reduce the memory load, as not the CSV has to be stored in memory.
Fiddle to play around with the functions Original named like yours, new ones names with a suffix 2
.
Upvotes: 4