Kevin DiTraglia
Kevin DiTraglia

Reputation: 26078

RegEx split not giving expected results

So I have a text file that is feeding me comma separated data that is enclosed in double quotes like so:

string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";

I want to capture every value, originally I simply split on comma, but I noticed sometimes things had commas between the quotes, so I changed it to instead use a regex to just pull everything between quotes with a very simple regex:

string pattern = "\"[^\"]*\"";

Using regexpal this returns exactly what I want, but for whatever reason or another when I run this small program in c#, I get returned a list of all commas, instead of the values I'm actually interested in, I'm not really sure why. Can anyone spot my error?

string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";
string pattern = "\"[^\"]*\"";
string[] lines = Regex.Split(test, pattern); //Returns a list of commas in quotes

Upvotes: 1

Views: 373

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726839

This is because Regex.Split uses the pattern to decide where the sequence must be split. In other words, the pattern describes separators, not the content that you would like to capture:

Splits an input string into an array of substrings at the positions defined by a regular expression pattern

To use the expression the way you want you need to call Regex.Matches to obtain a MatchCollection, and retrieve the individual matches from that collection:

string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";
string pattern = "\"[^\"]*\"";
MatchCollection mc = Regex.Matches(test, pattern);
foreach (var m in mc) {
    Console.WriteLine(m);
}

Here is a demo on ideone.

Upvotes: 4

Related Questions