Reputation: 45
Can someone modify this Regex to remove words as in the example:
This does not work with extra in it below: (<.+?\/>)(?=\1)
<text><text>extra<words><text><words><something>
Should turn into:
<text>extra<words><something>
Thanks
Upvotes: 0
Views: 101
Reputation: 149020
This is what I've come up with using lookbehinds and back references:
(<[^>]+>)(?<=\1.*\1)
This will match any instance of <tag>
which is preceded by at least one other instance of the same <tag>
.
For example, to use this in C#:
var input = "<text><text>extra<words><text><words><something>";
var output Regex.Replace(input, @"(<[^>]+>)(?<=\1.*\1)", "");
Console.WriteLine(output); // <text>extra<words><something>
However, this will not work in many flavors of regex. JavaScript, for example, does not support lookbehinds.
Upvotes: 1