Rezaeimh7
Rezaeimh7

Reputation: 1545

C# regex finding the number of captured group

Suppose this pattern for replacing all URLs in string

string domains = "(gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)";

string pattern = @"([\n ]|^)?(((https?|ftp)://)?(www\.)?([\w\d-]+\.)+" + domains + @"([/][\w\d_~:?#@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?";

I want to replace all URLs with _URL_ tag but keep both the delimiters in left and right sides of the string.

As i know, $1 refers to ([\n ]|^)? at the beginning of the pattern but i couldn't find the correct number for ([\n ]|^)? at the end of pattern!

Regex.Replace(data, pattern, "$1_URL_$?"); // what should be replaced by ?

I tested for $2-$8 and no one was correct.

Is there any specific rule for such a situations?

Upvotes: 0

Views: 42

Answers (2)

Aman Chhabra
Aman Chhabra

Reputation: 3894

From your requirement it doesn't seem you need to capture the remaning groups, so you can use non capturing groups for them.

Try this:

string pattern = @"([\n ]|^)?(?:(?:(?:https?|ftp)://)?(?:www\.)?(?:[\w\d-]+\.)+" + domains + @"(?:[/][\w\d_~:?#@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?";

and

string domains = (?:gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)

and then you can simply use the $2 for the second group

Moreover, I would suggest you to simply use one capture group and replace it with _URL_

Demo: https://regex101.com/r/UjyOKU/2

Upvotes: 1

wp78de
wp78de

Reputation: 18950

Since you only need the group that matches the full URL, convert all inner parenthesis into non-capture groups: () to (?:). You may also want to integrate the domains directly into the pattern:

([\n ]|^)?((?:(?:https?|ftp)://)?(?:www\.)?(?:[\w\d-]+\.)+(?:gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)(?:[/][\w\d_~:?#\@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?

The front anchor then is captured into $1 and the rear anchor in $3. Or convert remaining URL in group $2 as well into a non-capturing group, if you like.

Demo

Upvotes: 1

Related Questions