Reputation: 26078
So I have a text file that is feeding me comma separated data that is enclosed in double quotes like so:
string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";
I want to capture every value, originally I simply split on comma, but I noticed sometimes things had commas between the quotes, so I changed it to instead use a regex to just pull everything between quotes with a very simple regex:
string pattern = "\"[^\"]*\"";
Using regexpal this returns exactly what I want, but for whatever reason or another when I run this small program in c#, I get returned a list of all commas, instead of the values I'm actually interested in, I'm not really sure why. Can anyone spot my error?
string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";
string pattern = "\"[^\"]*\"";
string[] lines = Regex.Split(test, pattern); //Returns a list of commas in quotes
Upvotes: 1
Views: 373
Reputation: 726839
This is because Regex.Split
uses the pattern to decide where the sequence must be split. In other words, the pattern describes separators, not the content that you would like to capture:
Splits an input string into an array of substrings at the positions defined by a regular expression pattern
To use the expression the way you want you need to call Regex.Matches
to obtain a MatchCollection
, and retrieve the individual matches from that collection:
string test = "\"foo\",\"bar\",\"1\",\"\",\"baz\"";
string pattern = "\"[^\"]*\"";
MatchCollection mc = Regex.Matches(test, pattern);
foreach (var m in mc) {
Console.WriteLine(m);
}
Here is a demo on ideone.
Upvotes: 4