Miguel
Miguel

Reputation: 558

Don't use capturing groups in c# Regex

I am writing a regular expression in Visual Studio 2013 using C#

I have the following scenario:

Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");

But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text

The string to get will be always between the second and the third group of %% Another approach is that the string will be always between the fourth and fifth %

I tried:

Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");

But with no luck.

I want to use match.Value because all my regex are in a database table.

Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?

Upvotes: 1

Views: 932

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:

(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^      ^^^^^

If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):

(?<=^%[^%]*%)[^%]+(?=%)

See regex demo

And in case it is between the 4th and the 5th:

(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^

For single-% delimited strings (see demo):

(?<=^%(?:[^%]*%){3})[^%]+(?=%)

See another demo

Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.

The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.

In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):

(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)

Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.

Upvotes: 2

Related Questions