Scott Chamberlain
Scott Chamberlain

Reputation: 127543

Match regex pattern when not inside a set of quotes (text spans multiple lines)

This is a continuation of my previous question .NET regex engine returns no matches but I am expecting 8.

My query is handling everything perfectly and I have my capture groups working great, however I have found a edge case that I do not know how to handle.

Here is a test case that I am having trouble with.

INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');

Here is my pattern I am using, I use the RegexOptions flags Singleline, Multiline, ExplicitCapture, and IgnorePatternWhitespace

^\(
((('(?<s>.*?)'(?!')) |
 (?<n>-?[\d\.]+)
 )(\s,\s)?
)+
#(?<!'')   #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$

I can either handle Case 3 or Case 4 but I am having trouble handling both.

If I had a way to check to see if there was a even number of ' in the capture group 's` I could check then to see if we are on a real end of line or in text block that has a line that ends that just happens to match the pattern. but I can not figure out how to modify other examples to handle multiple lined text strings.

Can what I want be done with a single regex query or am I forced to do post processing (using the commented case) and do this is two passes?


Here is the code to run it in LINQPad

string text = 
@"INSERT INTO [Example] ( [CaseNumber] , [TestText] )
VALUES
(1 , 'Single Line Case'),
(2 , 'Multi
Line Case');
(3 , 'Two Lines with odd end '');
Case');
(4 , ''),
(5 , 'Case 3 is the Empty Text Case');
";

const string recordRegex =
@"^\(
((('(?<s>.*?)'(?!')) |
 (?<n>-?[\d\.]+)
 )(\s,\s)?
)+
#(?<!'')   #Commented Case 3 works, un-commented case 2 works
\)[;,]\r?$";

var records = Regex.Matches(text, recordRegex, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);
records.Dump();

Upvotes: 2

Views: 1179

Answers (1)

Qtax
Qtax

Reputation: 33908

An expression like this would match such quotes:

(?:'[^']*')+

If you want to match foo when it's not inside such quotes, you could use something like:

foo(?=[^']*(?:'[^']*'[^']*)+\z)

one match per line with the unquoted text and numbers as capture groups

Something like this:

(?xm)^
\(

(?:
    (?:
        (?<quote> (?:'[^']*')+ )
    |   (?<num>   -?\d+(?:\.\d+)? )
    |   (?<x>     X'[0-9a-f]*' )
    )
    (?:\s*,\s*)?
)+

\)
[;,] 
\r?$

Upvotes: 1

Related Questions