Andrei
Andrei

Reputation: 44550

C# split comma separated values

How can I split comma separated strings with quoted strings that can also contain commas?

Example input:

John, Doe, "Sid, Nency", Smith

Expected output:

Split by commas was ok, but I've got requirement that strings like "Sid, Nency" are allowed. I tried to use regexes to split such values. Regex ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" is from Java question and it is not working good for my .NET code. It doubles some strings, finds extra results etc.

So what is the best way to split such strings?

Upvotes: 2

Views: 2874

Answers (4)

lightbricko
lightbricko

Reputation: 2709

If you find the regular expression too complex you can do it like this:

string initialString = "John, Doe, \"Sid, Nency\", Smith";

IEnumerable<string> splitted = initialString.Split('"');
splitted = splitted.SelectMany((str, index) => index % 2 == 0 ? str.Split(',') : new[] { str });
splitted = splitted.Where(str => !string.IsNullOrWhiteSpace(str)).Select(str => str.Trim());

Upvotes: 0

peter.petrov
peter.petrov

Reputation: 39437

Just go through your string. As you go through your string keep track
if you're in a "block" or not. If you're - don't treat the comma as
a comma (as a separator). Otherwise do treat it as such. It's a simple
algorithm, I would write it myself. When you encounter first " you enter
a block. When you encounter next ", you end that block you were, and so on.
So you can do it with one pass through your string.

import java.util.ArrayList;


public class Test003 {

    public static void main(String[] args) {
        String s = "  John, , , , \" Barry, John  \" , , , , , Doe, \"Sid ,  Nency\", Smith  ";

        StringBuilder term = new StringBuilder();
        boolean inQuote = false;
        boolean inTerm = false;
        ArrayList<String> terms = new ArrayList<String>();
        for (int i=0; i<s.length(); i++){
            char ch = s.charAt(i);
            if (ch == ' '){
                if (inQuote){
                    if (!inTerm) { 
                        inTerm = true;
                    }
                    term.append(ch);
                }
                else {
                    if (inTerm){
                        terms.add(term.toString());
                        term.setLength(0);
                        inTerm = false;
                    }
                }
            }else if (ch== '"'){
                term.append(ch); // comment this out if you don't need it
                if (!inTerm){
                    inTerm = true;
                }
                inQuote = !inQuote;
            }else if (ch == ','){
                if (inQuote){
                    if (!inTerm){
                        inTerm = true;
                    }
                    term.append(ch);
                }else{
                    if (inTerm){
                        terms.add(term.toString());
                        term.setLength(0);
                        inTerm = false;
                    }
                }
            }else{
                if (!inTerm){
                    inTerm = true;
                }
                term.append(ch);
            }
        }

        if (inTerm){
            terms.add(term.toString());
        }

        for (String t : terms){
            System.out.println("|" + t + "|");
        }

    }



}

Upvotes: 1

Jerry
Jerry

Reputation: 71538

It's because of the capture group. Just turn it into a non-capture group:

",(?=(?:[^""]*""[^""]*"")*[^""]*$)"
      ^^

The capture group is including the captured part in your results.

ideone demo

var regexObj = new Regex(@",(?=(?:[^""]*""[^""]*"")*[^""]*$)");
regexObj.Split(input).Select(s => s.Trim('\"', ' ')).ForEach(Console.WriteLine);

And just trim the results.

Upvotes: 4

Moo-Juice
Moo-Juice

Reputation: 38825

I use the following code within my Csv Parser class to achieve this:

    private string[] ParseLine(string line)
    {
        List<string> results = new List<string>();
        bool inQuotes = false;
        int index = 0;
        StringBuilder currentValue = new StringBuilder(line.Length);
        while (index < line.Length)
        {
            char c = line[index];
            switch (c)
            {
                case '\"':
                    {
                        inQuotes = !inQuotes;
                        break;
                    }

                default:
                    {
                        if (c == ',' && !inQuotes)
                        {
                            results.Add(currentValue.ToString());
                            currentValue.Clear();
                        }
                        else
                            currentValue.Append(c);
                        break;
                    }
            }
            ++index;
        }

        results.Add(currentValue.ToString());
        return results.ToArray();
    }   // eo ParseLine

Upvotes: 0

Related Questions