micknt
micknt

Reputation: 297

Splitting comma delimited csv file with inline commas

I am splitting up a comma delimited CSV-file by using this code below. It works fine, but some of the records in the file contains a comma within the data, even though comma is reserved as the delimiter. For example a record contains a description text where the user has entered commas and my script enterprets that like the record ends. How can I somehow get over that?

Here's an example of a line in my CSV-file. Please remark the empty records at the end of the lines can be witout the "" enclosing characters:

"1","34353434","Adam","Traxx","343","This man is the boss, please handle with respect","",,"","",,","","0",,"

This is how I read and split:

Sub ReadOnlineExample()
Dim line_read As String
Dim tempfilename As String
Dim i As Integer
Dim rows_skipped As Integer
Dim line_split(0 To 125) As String
Dim MortalityRates(0 To 125) As Double

 tempfilename = "C:\MortalityRateTable.csv"
 Dim sr As New System.IO.StreamReader(tempfilename)

'Split the line into individual data
line_split = line_read.Split(",")

'Save the split data into an array
For i = 0 To UBound(line_split)
  MortalityRates(i) = line_split(i)
Next i

End Sub

Can anyone help? Maybe it's just a simple solution I just can't see :-)

Upvotes: 1

Views: 4586

Answers (2)

Atiris
Atiris

Reputation: 2703

Your example is probably not correct.
Last part after sentence say: "",,"","",,","","0",,"

There is two base conditon:

  1. delimiter mark invoke enclosint whole inner string to quote mark (if you have x,y, and delimiter is comma, in CSV you get "x,y")
  2. quote mark is escaped as double quote mark (if you have They say "Hey!" in CSV you get "They say ""Hey!""")

Other strings will be save as you type it.

And you can not transfer your last part with this rules. Try change commas to semicolon and import into Excel. Excel displays your example as follows:

enter image description here

In my country is standard CSV delimiter semicolon.
I write this code for parsing CSV. You can try modify it for your purposes (delimiter is variable):

    public List<string> ParseCsvRow(char delimiter, string input)
    {
        List<string> result = new List<string>();
        string step = "";
        bool escaped = false;
        StringBuilder stringBuilder = new StringBuilder();
        int position = -1;

        do
        {
            if (++position >= input.Length)
            {
                result.Add(stringBuilder.ToString());
                stringBuilder = null;
                break;
            }
            step = input.Substring(position, 1);
            switch (step)
            {
                case "\"":
                    if (stringBuilder.Length == 0 && !escaped)
                    {
                        escaped = true;
                        continue;
                    }
                    if (position + 1 < input.Length)
                        step = input.Substring(++position, 1);
                    else
                        step = "";
                    if (step == "\"")
                    {
                        stringBuilder.Append("\"");
                        continue;
                    }
                    if (step.Equals(delimiter.ToString()) && escaped)
                    {
                        result.Add(stringBuilder.ToString());
                        stringBuilder.Clear();
                        escaped = false;
                        continue;
                    }
                    break;
                default:
                    if (step.Equals(delimiter.ToString()) && !escaped)
                    {
                        result.Add(stringBuilder.ToString());
                        stringBuilder.Clear();
                        continue;
                    }
                    stringBuilder.Append(step);
                    continue;
            }
        } while (true);
        return result;
    }

Upvotes: 0

Oded
Oded

Reputation: 499002

Use a proper CSV parser - File Helpers is a popular option.

There is also a parser from Microsoft - in the Microsoft.VisualBasic.FileIO namespace, the TextFieldParser.

These take care of the subtle aspects of CSV that most hand rolled solutions don't consider until hitting an issue.

Upvotes: 6

Related Questions