totalnoob
totalnoob

Reputation: 2741

regex exclude certain tag

I'm cleaning the output created by a wysiwyg, where instead of inserting a break it simply creates an empty p tag, but it sometimes creates other empty tags that's not needed.

I have a regex to remove all empty tags, but I want to exclude empty p tags from it. how do I do that?

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";

s = s.trim().replace( /<(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' )

console.log(s)

Upvotes: 0

Views: 109

Answers (3)

Matus Dubrava
Matus Dubrava

Reputation: 14502

You can use DOMParser to be on the safe side.

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";

const parser = new DOMParser();
const doc = parser.parseFromString(s, 'text/html');
const elems = doc.body.querySelectorAll('*');

[...elems].forEach(el => {
  if (el.textContent === '' && el.tagName !== 'P') {
    el.remove();
  }
});

console.log(doc.body.innerHTML);

Upvotes: 1

Mamun
Mamun

Reputation: 68933

Add (?!p) to your regex. This is called Negative Lookahead:

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>";

s = s.trim().replace( /<(?!p)(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' )

console.log(s)

Upvotes: 1

ibrahim tanyalcin
ibrahim tanyalcin

Reputation: 6501

I understand that you want to use regex for that, but there are better ways. Consider using DOMParser:

var x = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"
var parse = new DOMParser;
var doc = parse.parseFromString(x,"text/html");
Array.from(doc.body.querySelectorAll("*"))
    .filter((d)=>!d.hasChildNodes() && d.tagName.toUpperCase() !== "P")
    .forEach((d)=>d.parentNode.removeChild(d));
console.log(doc.body.innerHTML);
//"<h1>test</h1><p>a</p><p></p>"

You can wrap the above in a function and modify as you like.

Upvotes: 1

Related Questions