Reputation: 595
I'm trying to remove all html tags except p
, a
and img
tags. Right now I have:
content.replace(/(<([^>]+)>)/ig,"");
But this removes all HTML tags.
This are examples of the content of the api:
<table id="content_LETTER.BLOCK9" border="0" width="100%" cellspacing="0" cellpadding="0" bgcolor="#F7EBF5">
<tbody><tr><td class="ArticlePadding" colspan="1" rowspan="1" align="left" valign="top"><div>what is the opposite of...[] rest of text
Upvotes: 8
Views: 9961
Reputation: 161
var input = 'b<p on>b <p>good p</p> a<a>a h1<h1>h1 p<pre>p p</p onl>p img<img src/>img';
var output = input.replace(/(<(?!\/?((a|img)(\s+[^>]+)*|p)\s*>)([^>]+)>)/ig, '');
console.log(output);
output: bb <p>good p</p> a<a>a h1h1 pp pp img<img src/>img
And if you'd like to remove JS event handler attributes:
var input = 'b<p on>b <p>good p</p> a<a>a h1<h1>h1 p<pre>p p</p onl>p img<img src="y.gif" /> see <img src="x.png" onerror alt="cat" /> there';
var output = input.replace(/(<(?!\/?((a|img)(\s+((?!on)[^>])+)*|p)\s*>)([^>]+)>)/ig, '');
console.log(output);
output: bb <p>good p</p> a<a>a h1h1 pp pp img<img src="y.gif" /> see there
Upvotes: 0
Reputation: 9650
You may match the tags to keep in a capture group and then, using alternation, all other tags. Then replace with $1
:
(<\/?(?:a|p|img)[^>]*>)|<[^>]+>
Demo: https://regex101.com/r/Sm4Azv/2
And the JavaScript demo:
var input = 'b<body>b a<a>a h1<h1>h1 p<p>p p</p>p img<img />img';
var output = input.replace(/(<\/?(?:a|p|img)[^>]*>)|<[^>]+>/ig, '$1');
console.log(output);
Upvotes: 14
Reputation: 4981
You can use the below regex to remove all HTML tags except a
, p
and img
:
<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>
Replace with an empty string.
var text = '<tr><p><img src="url" /> some text <img another></img><div><a>blablabla</a></div></p></tr>';
var output = text.replace(/<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>/ig, '');
console.log(output);
Upvotes: 8