Hitesh
Hitesh

Reputation: 137

Json to csv in node.js

I am trying to convert a very large json to csv in node.js but it is taking too much time and also leading to 100% cpu at the time of conversion.

  jsonToCsv: function (data) {
    var keys = Object.keys(data[0]);
    var csv = [keys.join(",")];
    console.time("CSVGeneration");
    data.forEach(function (row) {
      var line = ''; 
      keys.forEach(function (key) {
        if (typeof row[key] === 'string') {
          row[key] = "" + file_utils.escapeCsv(row[key]) + "";
        } 
        line += row[key] + ",";
      });
      csv.push(line);  
    });
    console.timeEnd("CSVGeneration");
    csv = csv.join("\n");
    return csv;
  },
  escapeCsv: function (x) {
    if (x)
      return ('' + x.replace(/"/g, '').replace(/,/g, ' ').replace(/\n/g, " ").replace(/\r/g, " ") + '');
    else
      return ('');
  },

On an average run for 1Lac rows, it never recovered to even log time. I had to kill the process manually.

Can someone suggest a better alternative to this?

Upvotes: 2

Views: 2978

Answers (1)

Sirko
Sirko

Reputation: 74046

Before answering this: Assuming your code is working, this question belongs to https://codereview.stackexchange.com/ .

As for your problem:

  • the new Array access functions like forEach(), while being rather comfortable when coding, are usually not quite performant. A simple for loop is the better choice in performance critical situations.
  • in escapeCsv() you apply 4 different regex replacements each for just one character. Combine those into one.
  • Assuming you data is already structured in a way, that allows for Csv conversion (data is an Array of objects, each having the same properties), it is not necessary to retrieve the keys individually for each object.

Applying this, yields the following code:

function escapeCsv(x) {
    if (x) {
        return ('' + x).replace( /[",\n\r]/gi, '' );
    } else {
        return ('');
    }
}

function jsonToCsv(data) {
    var keys = Object.keys(data[0]),
        csv = [keys.join(",")];

    var row = new Array( keys.length );
    for (var i = 0; i < data.length; i++) {
        for (var j = 0; j < keys.length; j++) {
            if (typeof data[i][keys[j]] === 'string') {
                row[j] = '"' + escapeCsv(data[i][keys[j]]) + '"';
            } else {
                row[j] = data[i][keys[j]] || '';
            }
        }
        csv.push(row.join(','));
    }

    return csv.join("\n");
}

This alone yields a performance improvement for about 3-5 according to jsPerf.

If the CSV, you are generating can be streamed to a file or to a client directly, one could improve even more and reduce the memory load, as not the CSV has to be stored in memory.

Fiddle to play around with the functions Original named like yours, new ones names with a suffix 2.

jsPerf.com comparison

Upvotes: 4

Related Questions