Margus
Margus

Reputation: 20038

How to ignore regex matches in C#?

An input string:

string datar = "aag, afg, agg, arg";

I am trying to get matches: "aag" and "arg", but following won't work:

string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";

What is the correct way of ignoring regex matches in C#?

Upvotes: 1

Views: 2290

Answers (6)

Alan Moore
Alan Moore

Reputation: 75222

What you're using is Java's set intersection syntax:

a[a-z&&[^fg]]g

..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:

a[a-z-[fg]]g

...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').

Java demo:

String s = "aag, afg, agg, arg, a%g";

Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
  System.out.println(m.group());
}

C# demo:

string s = @"aag, afg, agg, arg, a%g";

foreach (Match m in Regex.Matches(s, @"a[a-z-[fg]]g"))
{
  Console.WriteLine(m.Value);
}

Output of both is

aag
arg

Upvotes: 3

atamanroman
atamanroman

Reputation: 11808

Try this if you want match arg and aag:

a[ar]g

If you want to match everything except afg and agg, you need this regex:

a[^fg]g

Upvotes: 2

Julien Hoarau
Julien Hoarau

Reputation: 49970

The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :

string regr = "a[a-z](?<!f|g)g"

Explanation :

  • a Match the character "a"
  • [a-z] Match a single character in the range between "a" and "z"
  • (?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
    • f|g Match the character "f" or match the character "g"
  • g Match the character "g"

Upvotes: 3

VladV
VladV

Reputation: 10349

Regex: a[a-eh-z]g. Then use Regex.Matches to get the matched substrings.

Upvotes: 0

John Kugelman
John Kugelman

Reputation: 361585

Character classes aren't quite that fancy. The simple solution is:

a[a-eh-z]g

If you really want to explicitly list out the letters that don't belong, you could try something like:

a[^\W\d_A-Zfg]g

This character class matches everything except:

  1. \W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
  2. \d removes digits so now we have letters and the underscore _.
  3. _ removes the underscore so now we only match letters.
  4. A-Z removes uppercase letters so now we only match lowercase letters.
  5. Finally at this point we can list the individual lowercase letters we don't want to match.

All in all way more complicated than we'd likely ever want. That's regular expressions for ya!

Upvotes: 3

Donut
Donut

Reputation: 112815

It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:

string regr = "a[a-eh-z]g";

Upvotes: 0

Related Questions