Sylver
Sylver

Reputation: 8967

Regex.Replace doesn't seem to work with back-reference

I made an application designed to prepare files for translation using lists of regexes.

It runs each regex on the file using Regex.Replace. There is also an inspector module which allows the user to see the matches for each regex on the list.

It works well, except when a regex contains a back-reference, Regex.Replace does not replace anything, yet the inspector shows the matches properly (so I know the regex is valid and matches what it should).

sSrcRtf = Regex.Replace(sSrcRtf, sTag, sTaggedTag,
  RegexOptions.Compiled | RegexOptions.Singleline);

sSrcRtf contains the RTF code of the page. sTag contains the regular expression in between parentheses. sTaggedTag contains $1 surrounded by the tag formating code.

To give an example:

sSrcRtf = Regex.Replace("the little dog", "((e).*?\1)", "$1", 
  RegexOptions.Compiled | RegexOptions.Singleline);

doesn't work. But

sSrcRtf = Regex.Replace("the little dog", "((e).*?e)", "$1", 
  RegexOptions.Compiled | RegexOptions.Singleline);

does. (of course, there is some RTF code around $1)

Any idea why this is?

Upvotes: 1

Views: 4737

Answers (3)

Adam Bellaire
Adam Bellaire

Reputation: 110489

You technically have two match groups there, the outer and the inner parentheses. Why don't you try addressing the inner set as the second capture, e.g.:

((e).*?\2)

Your parser probably thinks the outer capture is \1, and it doesn't make much sense to backreference it from inside itself.

Also note that your replacement won't do anything, since you are asking to replace the portion that you match with itself. I'm not sure what your intended behavior is, but if you are trying to extract just the match and discard the rest of the string, you want something like:

.*((e).*?\2).*

Upvotes: 2

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

As others have mentioned, there are some additional groups being captured. Your replacement isn't referencing the correct one.

Your current regex should be rewritten as (options elided):

Regex.Replace("the little dog", @"((e).*?\2)", "$2")
// or
Regex.Replace("the little dog", @"(e).*?\1", "$1")

Here's another example that matches doubled words and indicates which backreferences work:

Regex.Replace("the the little dog", @"\b(\w+)\s+\1\b", "$1")  // good
Regex.Replace("the the little dog", @"\b((\w+)\s+\2)\b", "$1") // no good
Regex.Replace("the the little dog", @"\b((\w+)\s+\2)\b", "$2") // good

Upvotes: 0

Welbog
Welbog

Reputation: 60408

You're using a reference to a group inside the group you're referencing.

"((e).*?\1)" // first capturing group
"(e)" // second capturing group

I'm not 100% certain, but I don't think you can reference a group from within that group. For starters, what would you expect the backreference to match, since it's not even complete yet?

Upvotes: 0

Related Questions