Reputation: 127543
This is a continuation of my previous question .NET regex engine returns no matches but I am expecting 8.
My query is handling everything perfectly and I have my capture groups working great, however I have found a edge case that I do not know how to handle.
Here is a test case that I am having trouble with.
INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');
Here is my pattern I am using, I use the RegexOptions flags Singleline
, Multiline
, ExplicitCapture
, and IgnorePatternWhitespace
^\(
((('(?<s>.*?)'(?!')) |
(?<n>-?[\d\.]+)
)(\s,\s)?
)+
#(?<!'') #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$
I can either handle Case 3 or Case 4 but I am having trouble handling both.
If I had a way to check to see if there was a even number of '
in the capture group 's` I could check then to see if we are on a real end of line or in text block that has a line that ends that just happens to match the pattern. but I can not figure out how to modify other examples to handle multiple lined text strings.
Can what I want be done with a single regex query or am I forced to do post processing (using the commented case) and do this is two passes?
Here is the code to run it in LINQPad
string text =
@"INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');
";
const string recordRegex =
@"^\(
((('(?<s>.*?)'(?!')) |
(?<n>-?[\d\.]+)
)(\s,\s)?
)+
#(?<!'') #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$";
var records = Regex.Matches(text, recordRegex, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);
records.Dump();
Upvotes: 2
Views: 1179
Reputation: 33908
An expression like this would match such quotes:
(?:'[^']*')+
If you want to match foo
when it's not inside such quotes, you could use something like:
foo(?=[^']*(?:'[^']*'[^']*)+\z)
one match per line with the unquoted text and numbers as capture groups
Something like this:
(?xm)^
\(
(?:
(?:
(?<quote> (?:'[^']*')+ )
| (?<num> -?\d+(?:\.\d+)? )
| (?<x> X'[0-9a-f]*' )
)
(?:\s*,\s*)?
)+
\)
[;,]
\r?$
Upvotes: 1