Doug Lerner
Doug Lerner

Reputation: 1543

How to split a string containing CSV data with arbitrary text into a JavaScript Array of Arrays?

I have a long string containing CSV data from a file. I want to store it in a JavaScript Array of Arrays. But one column has arbitrary text in it. That text could contain double-quotes and commas.

Splitting the CSV string into separate row strings is no problem:

var theRows = theCsv.split(/\r?\n/);

But then how would I best split each row?

Since it's CSV data I need to split on commas. But

var theArray = new Array();
for (var i=0, i<theRows.length; i++) {
    theArray[i] = theRows[i].split(',');    
}

won't work for elements containing quotes and commas, like this example:

512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media

How can I make sure that 2nd element gets properly stored in a single array element as

 "Fake News" and the "Best Way" to deal with A, B, and C

Thanks.

The suggested solution which looked similar unfortunately did not work when I tried the CSVtoArray function there. Instead of returning array elements, a null value was returned, as described in my comment below.

Upvotes: 6

Views: 1390

Answers (1)

MaximeW
MaximeW

Reputation: 430

This should do it:

let parseRow = function(row) {
  let isInQuotes = false;
  let values = [];
  let val = '';

  for (let i = 0; i < row.length; i++) {
    switch (row[i]) {
      case ',':
        if (isInQuotes) {
          val += row[i];
        } else {
          values.push(val);
          val = '';
        }
        break;

      case '"':
        if (isInQuotes && i + 1 < row.length && row[i+1] === '"') {
          val += '"'; 
          i++;
        } else {
          isInQuotes = !isInQuotes
        }
        break;

      default:
        val += row[i];
        break;
    }
  }

  values.push(val);

  return values;
}

It will return the values in an array:

parseRow('512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media');
// => ['512', '"Fake News" and the "Best Way" to deal with A, B, and C', ' 1/18/2019', 'media']

To get the requested array of arrays you can do:

let parsedCsv = theCsv.split(/\r?\n/).map(parseRow);

Explanation

The code might look a little obscure. But the principal idea is as follows: We parse the string character by character. When we encounter a " we set isInQuotes = true. This will change the behavior for parsing ,and "". When we encounter a single " we set isInQuotes = false again.

Upvotes: 3

Related Questions