Archie
Archie

Reputation: 2579

Regex to match all except a string in quotes in C#

I am a novice with Regex usage in C#. I want a regex to find the next keyword from a given list but which is not surrounded by the quotes.

e.g. if i have a code which looks like:

            while (t < 10)
            {
                string s = "get if stmt";
                u = GetVal(t, s);
                for(;u<8;u++)
                {
                    t++;
                }

            }

i tried using the Regex as @"(.*?)\s(FOR|WHILE|IF)\s" but it gives me the "if" as next keyword. whereas, i want to get the next keyword after while as "for" and not as "if" which is surrounded by quotes.

Can it be done in anyway using Regex? Or i will have to use conventional programming?

Upvotes: 3

Views: 4777

Answers (5)

Noldorin
Noldorin

Reputation: 147240

Try the following RegEx (Edit: fixed).

(?:[^\"]|(?:(?:.*?\"){2})*?)(?: |^)(?<kw>for|while|if)[ (]

Note: Because this RegEx literal includes quotes, you can't use the @ sign before the string. Remember that if you add any RegEx special chars to the string, you'll need to double-escape them appropiatlye (e.g. \w). Insure that you also specify the Multiline parameter when matching with the RegEx, so the caret (^) is treated as the start of a new line.

This hasn't been tested, but should do the job. Let me know if there's any problems. Also, depending on what more you want to do here, I might recommend using standard text-parsing (non-RegEx), as it will quickly become more readable depending on how much data you want to extract from the code. Hope that helps anyway.

Edit: Here's some example code, which I've tested and am pretty confident that it works as intended.

var input = "while t < 10 loop\n s => 'this is if stmt'; for u in 8..12 loop \n}"; 
var pattern = "(?:[^\"]|(?:(?:.*?\"){2})*?)(?: |^)(?<kw>for|while|if)[ (]";
var matches = Regex.Matches(input, pattern);
var firstKeyword = matches[0].Groups["kw"].Value;
// The following line is a one-line solution for .NET 3.5/C# 3.0 to get an array of all found keywords.
var keywords = matches.Cast<Match>().Select(match => match.Groups["kw"].Value).ToArray();

Hopefully this should be your complete solution now...

Upvotes: 2

bobince
bobince

Reputation: 536329

Can it be done in anyway using Regex?

In the general case, no. The syntax of C# is not amenable to regex parsing.

Consider these corner cases:

method("xxx\"); while (\"xxx");

method(@"xxx \"); while (...);

// while

/* while */

/* xxx
// xxx */ while

/* xxx " xxx */ while ("...

Languages as complex as C# need dedicated parsers.

Upvotes: 0

Draco
Draco

Reputation: 16364

If you decide to go the Regex route you can use this site to test your regular expression

Upvotes: 1

NileshChauhan
NileshChauhan

Reputation: 5559

I suppose Regex, can not readily understand C# keywords. I would suggest you to use : Microsoft.CSharp.CSharpCodeProvider, using this Visual studio manages C# code.

Upvotes: 0

John Leidegren
John Leidegren

Reputation: 60987

You can try backreferencing, which would let you match the string, but since you want to do the exact opposite you'd be better of escaping the string instead, that's actually really easy.

Either write a regex that matches strings and replaces them with nothing, or run through the text skipping quoted strings and looking for keywords in the mean time. I recon the latter will be more efficient.

Upvotes: 0

Related Questions