Reputation: 1545
Suppose this pattern for replacing all URLs in string
string domains = "(gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)";
string pattern = @"([\n ]|^)?(((https?|ftp)://)?(www\.)?([\w\d-]+\.)+" + domains + @"([/][\w\d_~:?#@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?";
I want to replace all URLs with _URL_
tag but keep both the delimiters in left and right sides of the string.
As i know, $1
refers to ([\n ]|^)?
at the beginning of the pattern but i couldn't find the correct number for ([\n ]|^)?
at the end of pattern!
Regex.Replace(data, pattern, "$1_URL_$?"); // what should be replaced by ?
I tested for $2-$8 and no one was correct.
Is there any specific rule for such a situations?
Upvotes: 0
Views: 42
Reputation: 3894
From your requirement it doesn't seem you need to capture the remaning groups, so you can use non capturing groups for them.
Try this:
string pattern = @"([\n ]|^)?(?:(?:(?:https?|ftp)://)?(?:www\.)?(?:[\w\d-]+\.)+" + domains + @"(?:[/][\w\d_~:?#@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?";
and
string domains = (?:gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)
and then you can simply use the $2
for the second group
Moreover, I would suggest you to simply use one capture group and replace it with _URL_
Demo: https://regex101.com/r/UjyOKU/2
Upvotes: 1
Reputation: 18950
Since you only need the group that matches the full URL, convert all inner parenthesis into non-capture groups: ()
to (?:)
. You may also want to integrate the domains directly into the pattern:
([\n ]|^)?((?:(?:https?|ftp)://)?(?:www\.)?(?:[\w\d-]+\.)+(?:gl|me|com|ir|org|net|edu|info|me|ac|name|biz|co|pro|ws|asia|mobi|tel|eu|in|ru|tv|cc|es|de|ca|mn|bz|uk|us|au)(?:[/][\w\d_~:?#\@!%$&'()*+,;=`\[\]\.\-]+)*)([\n ]|$)?
The front anchor then is captured into $1
and the rear anchor in $3
. Or convert remaining URL in group $2
as well into a non-capturing group, if you like.
Upvotes: 1