Reputation: 14907
This one is a real head scratcher for me...
var matches = Regex.Matches("<p>test something<script language=\"javascript\">alert('hello');</script> and here's <b>bold</b> and <i>italic</i> and <a href=\"http://popw.com/\">link</a>.</p>", "</?(?!p|a|b|i)\b[^>]*>");
The Regex is supposed to capture any HTML tag (open or close) that's not p, a, b, or i. I've plugged the input string and regex into countless testing pages, and every one of them return the script tag (open and close) as matches. But it absolutely doesn't work in the code. The matches variable has a count of 0.
Am I missing something incredibly obvious?
Upvotes: 4
Views: 537
Reputation: 89171
(?! )
is a negative look-ahead. It matches zero characters if it's contained pattern does not match from the current position.
(?!p|a|b|i)\\b
will look at the next character to see if it matches p|a|b|i
. If it does, the look-ahead fails to match anything. If the contained pattern fails to match, the look-ahead succeeds, and it tries to match the next token in the pattern from the same position. In this case a word boundary.
What you want is probably something like this:
@"</?(?!(?:p|a|b|i)\b)\w+[^>]*>"
It looks ahead for something that matches (?:p|a|b|i)\b
. If the that pattern fails to match, the look-ahead succeeds, and it will match at least one word-character, followed by any number of characters up until the closing ">"
.
Upvotes: 0
Reputation: 700342
You forgot to escape the backslash in the pattern string.
"</?(?!p|a|b|i)\\b[^>]*>"
Upvotes: 8