axel
axel

Reputation: 79

CSV Parsing Strings containing double quotes and commas

Apologies if this question is a duplicate.

I'm trying to create a CSV file with, as an example, a Title, Name and Comments column. Each row of the CSV has it's values for each column being read from the Database into the relevant variable, and these variables, split by commas, are then added to a variable 'newline' which was then appended to the csv as follows:

    string title = reader[0]
    string name = reader[1]
    string comments = reader[2]
    var csv = new StringBuilder();
    var headerLine = "Title,Name,Comments"
    csv.AppendLine(headerLine);

    var newline = title + "," + name + "," + comments;
    csv.AppendLine(newline);

Originally I took the above approach, but quickly ran into the problem where, if any of the fields read in from the database had commas in them, that field would be split in the middle, e.g. if comments = "Comment, goes, here" would be spread across 3 columns in the out CSV instead of 1.

To try and overcome this problem, I changed my code so that all fields would be wrapped in double quotes, meaning commas within a field would no longer split the CSV, as follows:

    var newline = "\"" + title + "\",\"" + name + "\",\"" + comments + "\""

Now, with this in place, I am getting a problem where if any of the values read in from the Database contain a double quote, this messes up everything, since a double quote from the database is displayed as \" when the field is converted into a string, which is what I'm already using to wrap the start and end of each field in double quotes to prevent commas splitting fields containing commas.

Upvotes: 4

Views: 7005

Answers (3)

Nelviticus
Nelviticus

Reputation: 151

You double up any existing double quotes that come from the database, so they're seen as being contained within the field (column) rather than signifying the beginning or end of a field.

See this answer for an example of CSV that contains both double quotes and commas within fields: https://stackoverflow.com/a/12473481/1184850

Upvotes: 1

juharr
juharr

Reputation: 32266

You need to escape the double quotes. What you escape them with depends on whatever you plan to read them with. It's possible that the escape sequence should be a backslash followed by a double quote, but if using Excel it will be two double quotes in a row. That can be done using string.Replace.

var newline = "\"" + title.Replace("\"", "\"\"") + 
              "\",\"" + name.Replace("\"", "\"\"") + 
              "\",\"" + comments.Replace("\"", "\"\"") + "\"";

Or using string.Format

var newline = string.Format(
    "\"{0}\",\"{1}\",\"{2}\"",
    title.Replace("\"", "\"\""),
    name.Replace("\"", "\"\""),
    comments.Replace("\"", "\"\""));

Upvotes: 4

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186668

The rule is simple: if item contains either , (comma) or " (quotation mark) it should be put into quotation marks; each " within the item should be doubled:

  abcde    -> abcde        // or "abcde", quotation is not mandatory, however
  abc,de   -> "abc,de"
  abc"de   -> "abc""de"
  abc","de -> "abc"",""de"

possible implementation:

    public static String EncodeCsvItem(String value) {
      if (String.IsNullOrEmpty(value))
        return "";

      StringBuilder Sb = new StringBuilder(value.Length + 2);

      bool wrap = false;

      foreach (var ch in value) {
        if (ch == '"') {
          Sb.Append('"');
          wrap = true;
        }
        else if (ch == ',')
          wrap = true;

        Sb.Append(ch);
      }

      if (wrap) {
        Sb.Insert(0, '"');
        Sb.Append('"');
      }

      return Sb.ToString();
    }

....

   var newline = String.Join(",",
     EncodeCsvItem(reader[0]),
     EncodeCsvItem(reader[1]),
     EncodeCsvItem(reader[2]) 
   );

   csv.AppendLine(newline);

Upvotes: 3

Related Questions