Reputation: 129
I have a regex pattern defined as
var pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
and I am trying to split some CSV like strings to get fields
Some example strings that WORK with this regex are
_input[0] = ""; // expected single blank field
_input[1] = "A,B,C"; // expected three individual fields
_input[2] = "\"A,B\",C"; // expected two fields 'A,B' and C
_input[3] = "\"ABC\"\",\"Text with,\""; // expected two fields, 'ABC"', 'Text with,'
_input[4] = "\"\",ABC\",\"next_field\""; // expected two fields, '",ABC', 'next_field'
However, this is not working
_input[5] = "\"\"\",ABC\",\"next_field\"";
I am expecting three fields
'"', 'ABC"', 'next_field'
But I am getting two fields
'"",ABC', 'next_field'
Can anybody help with this regex?
I think the strange part is that the second column doesn't have quotes at the start and end of the value, just at the end. So the first column's value is empty, and the second column is ABC"
Thanks, Rob
Upvotes: 3
Views: 1784
Reputation: 2104
I think you need to be even more specific about what your logic is in terms of how the double quotes should be treated, as it appears that your requirements conflicts with each other.
My quick version that I think comes closest to what you are trying to achieve is this (please note 1) The missing escaping of double quotes, because I am using an external tool to validate the regex, and 2) I have changed how to retrieve the matched values, see the bottom for an example):
(?<Match>(?:"[^"]*"+|[^,])*)(?:,(?<Match>(?:"[^"]*"+|[^,])*))*
It has the following logic:
The above logic conflicts with what you expect from index 4 and 5 however, because I get:
[4] = '""' and 'ABC","next_field"'
[5] = '"""' and 'ABC","next_field"'
If you could point out why the above logic is wrong for your needs/expectations, I'll edit my answer with a fully working regex.
To retrieve your values, you could do it like this:
string pattern = @"(?<Match>(?:""[^""]*""+|[^,])*)(?:,(?<Match>(?:""[^""]*""+|[^,])*))*";
string[] testCases = new[]{
@"",
@"A,B,C",
@"A,B"",C",
@"ABC"",""Text with,",
@""",ABC"",""next_field""",
@""""",ABC"",""next_field"""
};
foreach(string testCase in testCases){
var match = System.Text.RegularExpressions.Regex.Match(testCase, pattern);
string[] matchedValues = match.Groups["Match"].Captures
.Cast<System.Text.RegularExpressions.Capture>()
.Select(c => c.Value)
.ToArray();
}
Upvotes: 3