Reputation: 32949
I am trying to strip all (except some) html tags from a string using regex. What I am trying currently trying is as follows:
var a = "<pre><code><p>This is a <span>test</span></p></code></pre>";
var b = a.replace(/(\<|\<\/)[^\>,p,li,br,b]*\>/ig,"");
but b's value is returned as "<pre><p>This is a <span>test</span></p></pre>"
It seems any tags that contain even a single instance of [>,p,li,br,b] are not being removed, as here all pre
, span
and p
contains the character p
. However, I only want to ignore the exact characters in [^\>,p,li,br,b]
.
The output I am expecting is "<p>This is a test</p>"
.
what am I doing wrong ?
Upvotes: 0
Views: 569
Reputation: 2397
var a = "<pre><code><p>This is a <span>test</span></p></code></pre>";
var b = a.replace(/\<(?!\/?(p|li|br|b)[ >])[^>]*\>/ig,"");
This regex matches the leading <
or </
only if it is not followed by one of the tag names you want to keep p
, li
, br
, b
(followed by a space or closing >
, so that it doesn't think <pre>
is <p>
).
Then it matches everything up to the closing >
.
Upvotes: 3
Reputation: 42089
See this answer.
That said, square brackets []
match on single characters, not words - for more information on what yours is doing, see the bottom of this answer. Instead, you would need to use parentheses (?:p|li|br|b)
to match words - the ?:
is used to avoid capturing. Also, the parentheses would occur outside of the square brackets.
Since you're using a negative match you may wish to look into lookarounds; specifically, the section on Positive and Negative Lookbehind.
[^\>,p,li,br,b]
translates to not >
and not ,
and not p
and not ,
and not l
and not i
and not ,
and not b
and not r
and not ,
and not b
.
Upvotes: 1