user937036
user937036

Reputation: 391

C# Regular expression to match on a character not following pairs of the same charcater

Objective: Regex Matching

For this example I'm interested in matching a "|" pipe character. I need to match it if it's alone: "aaa|aaa" I need to match it (the last pipe) only if it's preceded by pairs of pipe: (2,4,6,8...any even number)

Another way: I want to ignore ALL pipe pairs "||" (right to left) or I want to select bachelor bars only (the odd man out)

string twomatches = "aaaaaaaaa||||**|**aaaaaa||**|**aaaaaa";
string onematch = "aaaaaaaaa||**|**aaaaaaa||aaaaaaaa";

string noMatch = "||";
string noMatch = "||||";

I'm trying to select the last "|" only when preceded by an even sequence of "|" pairs or in a string when a single bar exists by itself. Regardless of the number of "|"

Upvotes: 2

Views: 271

Answers (2)

bobble bubble
bobble bubble

Reputation: 18515

Oh, it's reopened! If you need better performance, also try this negative improved version.

\|(?!\|)(?<!(?:[^|]|^)(?:\|\|)*)

The idea here is to first match the last literal | at right side of a sequence or single | and execute a negated version of the lookbehind just after the match. This should perform considerably better.

  • \|(?!\|) matches literal | IF NOT followed by another pipe character (right most if sequence).
  • (?<!(?:[^|]|^)(?:\|\|)*) IF position right after the matched | IS NOT preceded by (?:\|\|)* any amount of literal || until a non| or ^ start.
    In other words: If this position is not preceded by an even amount of pipe characters.

Btw, there is no performance gain in using \|{2} over \|\| it might be better readable.

See demo at regexstorm

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627077

You may use the following regex to select just odd one pipe out:

(?<=(?<!\|)(?:\|{2})*)\|(?!\|)

See regex demo.

The regex breakdown:

  • (?<=(?<!\|)(?:\|{2})*) - if a pipe is preceded with an even number of pipes ((?:\|{2})* - 0 or more sequences of exactly 2 pipes) from a position that has no preceding pipe ((?<!\|))
  • \| - match an odd pipe on the right
  • (?!\|) - if it is not followed by another pipe.

Please note that this regex uses a variable-width look-behind and is very resource-consuming. I'd rather use a capturing group mechanism here, but it all depends on the actual purpose of matching that odd pipe.

Here is a modified version of the regex for removing the odd one out:

var s = "1|2||3|||4||||5|||||6||||||7|||||||";
var data = Regex.Replace(s, @"(?<!\|)(?<even_pipes>(?:\|{2})*)\|(?!\|)", "${even_pipes}");
Console.WriteLine(data);

See IDEONE demo. Here, the quantified part is moved from lookbehind to an even_pipes named capturing group, so that it could be restored with the backreference in the replaced string. Regexhero.net shows 129,046 iterations per second for the version with a capturing group and 69,206 with the original version with variable-width lookbehind.

Only use variable-width look-behind if it is absolutely necessary!

Upvotes: 1

Related Questions