Reputation: 2741
I'm cleaning the output created by a wysiwyg, where instead of inserting a break it simply creates an empty p tag, but it sometimes creates other empty tags that's not needed.
I have a regex to remove all empty tags, but I want to exclude empty p tags from it. how do I do that?
let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";
s = s.trim().replace( /<(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' )
console.log(s)
Upvotes: 0
Views: 109
Reputation: 14502
You can use DOMParser
to be on the safe side.
let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";
const parser = new DOMParser();
const doc = parser.parseFromString(s, 'text/html');
const elems = doc.body.querySelectorAll('*');
[...elems].forEach(el => {
if (el.textContent === '' && el.tagName !== 'P') {
el.remove();
}
});
console.log(doc.body.innerHTML);
Upvotes: 1
Reputation: 68933
Add (?!p)
to your regex. This is called Negative Lookahead
:
let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";
s = s.trim().replace( /<(?!p)(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' )
console.log(s)
Upvotes: 1
Reputation: 6501
I understand that you want to use regex for that, but there are better ways. Consider using DOMParser
:
var x = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"
var parse = new DOMParser;
var doc = parse.parseFromString(x,"text/html");
Array.from(doc.body.querySelectorAll("*"))
.filter((d)=>!d.hasChildNodes() && d.tagName.toUpperCase() !== "P")
.forEach((d)=>d.parentNode.removeChild(d));
console.log(doc.body.innerHTML);
//"<h1>test</h1><p>a</p><p></p>"
You can wrap the above in a function and modify as you like.
Upvotes: 1