Chuck Norris
Chuck Norris

Reputation: 15190

Regex.Replace has strange behavior at reluctant match

Answering to this question I stuck with this situation. Using reluctant match in my regex bring to this result

string s = Regex.Replace(".A.", "\\w*?", "B");

B.BAB.B

Why it doesn't match and replace A?

Upvotes: 0

Views: 145

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56915

Because the \\w*? matches as few \w as it possibly can, including 0 of them.

Since you have \w* instead of \w+, the regex matches 0 or more \w.

Since you have an additional ? on the \w*, the smallest possible match for this regex is the 0-length string, ''.

Since the ? forces the regex to match as small a match as possible, it only ever matches 0-length strings. It can't match a single character A because that would be a longer match than the shortest.

Hence all 0-length strings in .A. (being: ''.''A''.'', where each possible 0-length string is marked as '') are replaced with a 'B', giving you 'B.A.B'.

If you want to disable this behaviour and replace at least one \w, you can use regex \w+?. However, by the same reasoning as before, the ? forces this to only ever replace \w of length one, so you may as well use regex \w.

Upvotes: 5

Related Questions