Jake Smith
Jake Smith

Reputation: 2813

How can I Split(',') a string while ignore commas in between quotes?

I am using the .Split(',') method on a string that I know has values delimited by commas and I want those values to be separated and put into a string[] object. This works great for strings like this:

78,969.82,GW440,.

But the values start to look different when that second value goes over 1000, like the one found in this example:

79,"1,013.42",GW450,....

These values are coming from a spreadsheet control where I use the controls built in ExportToCsv(...) method and that explains why a formatted version of the actual numerical value.

Question

Is there a way I can get the .Split(',') method to ignore commas inside of quotes? I don't actually want the value "1,013.42" to be split up as "1 and 013.42".

Any ideas? Thanks!

Update

I really would like to do this without incorporating a 3rd party tool as my use case really doesn't involve many other cases besides this one and even though it is part of my work's solution, having a tool like that incorporated doesn't really benefit anyone at the moment. I was hoping there was something quick to solve this particular use case that I was missing, but now that it is the weekend, I'll see if I can't give one more update to this question on Monday with the solution I eventually come up with. Thank you everyone for you assistance so far, I'll will assess each answer further on Monday.

Upvotes: 12

Views: 19853

Answers (3)

Evan L
Evan L

Reputation: 3855

This is a fairly straight forward CSV Reader implementation we use in a few projects here. Easy to use and handles those cases you are talking about.

First the CSV Class

public static class Csv
{
    public static string Escape(string s)
    {
        if (s.Contains(QUOTE))
            s = s.Replace(QUOTE, ESCAPED_QUOTE);

        if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
            s = QUOTE + s + QUOTE;

        return s;
    }

    public static string Unescape(string s)
    {
        if (s.StartsWith(QUOTE) && s.EndsWith(QUOTE))
        {
            s = s.Substring(1, s.Length - 2);

            if (s.Contains(ESCAPED_QUOTE))
                s = s.Replace(ESCAPED_QUOTE, QUOTE);
        }

        return s;
    }


    private const string QUOTE = "\"";
    private const string ESCAPED_QUOTE = "\"\"";
    private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };

}

Then a pretty nice Reader implementation - If you need it. You should be able to do what you need with just the CSV class above.

public sealed class CsvReader : System.IDisposable
{
    public CsvReader(string fileName)
        : this(new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
    }

    public CsvReader(Stream stream)
    {
        __reader = new StreamReader(stream);
    }

    public System.Collections.IEnumerable RowEnumerator
    {
        get
        {
            if (null == __reader)
                throw new System.ApplicationException("I can't start reading without CSV input.");

            __rowno = 0;
            string sLine;
            string sNextLine;

            while (null != (sLine = __reader.ReadLine()))
            {
                while (rexRunOnLine.IsMatch(sLine) && null != (sNextLine = __reader.ReadLine()))
                    sLine += "\n" + sNextLine;

                __rowno++;
                string[] values = rexCsvSplitter.Split(sLine);

                for (int i = 0; i < values.Length; i++)
                    values[i] = Csv.Unescape(values[i]);

                yield return values;
            }

            __reader.Close();
        }

    }

    public long RowIndex { get { return __rowno; } }

    public void Dispose()
    {
        if (null != __reader) __reader.Dispose();
    }

    //============================================


    private long __rowno = 0;
    private TextReader __reader;
    private static Regex rexCsvSplitter = new Regex(@",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))");
    private static Regex rexRunOnLine = new Regex(@"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$");

}

Then you can use it like this.

var reader = new CsvReader(new FileStream(file, FileMode.Open));

Note: This would open an existing CSV file, but can be modified fairly easily to take a string[] like you need.

Upvotes: 11

Paweł Bejger
Paweł Bejger

Reputation: 6366

You should probably read this article: Regular Expression for Comma Based Splitting Ignoring Commas inside Quotes Although it is for Java, but the regular expression is the same.

Upvotes: 1

user27414
user27414

Reputation:

Since you're reading a CSV file, the best course of action would be to use an existing CSV reader. There's more to CSV than just commas between quotes. Finding all of the cases you need to handle would be more work than it's worth.

Here's a CSV reader question on SO.

Upvotes: 3

Related Questions