Reputation: 113
I want to delete empty tags such as <label></label>
, <font> </font>
so that:
<label></label><form></form>
<p>This is <span style="color: red;">red</span>
<i>italic</i>
</p>
will be cleaned as:
<p>This is <span style="color: red;">red</span>
<i>italic</i>
</p>
I have this RegEx in javascript, but it deletes the the empty tags but it also delete this: "<i>italic</i></p>"
str=str.replace(/<[\S]+><\/[\S]+>/gim, "");
What I am missing?
Upvotes: 11
Views: 34588
Reputation: 2032
if just want to remove all empty tags
html = html.replace(/<([A-z]+)([^>^/]*)>\s*<\/\1>/gim, '');
but carefull, sometimes, table will wrong display. So if you want to remove empty html tags except and in javascript, we use callback
html = html.replace(/<([A-z]+)([^>^/]*)>\s*<\/\1>/gim, function(match, p1, p2) {
if (p1 === 'tr' || p1 === 'td') {
return match;
} else {
return '';
}
});
Upvotes: 0
Reputation: 5333
<([^>]+)\s*>\s*<\/\1\s*>
<div>asdf</div>
<div></div> -- will match only this
<div></notdiv>
-- and this
<div >
</div >
try yourself https://regexr.com/
Upvotes: 2
Reputation: 3298
remove empty tags with cheerio will and also removing images:
$('*')
.filter(function(index, el) {
return (
$(el)
.text()
.trim().length === 0
)
})
.remove()
remove empty tags with cheerio, but also keep images:
$('*')
.filter(function(index, el) {
return (
el.tagName !== 'img' &&
$(el).find(`img`).length === 0 &&
$(el)
.text()
.trim().length === 0
)
})
.remove()
Upvotes: 0
Reputation: 30580
You have "not spaces" as your character class, which means "<i>italic</i></p>
" will match. The first half of your regex will match "<(i>italic</i)>
" and the second half "</(p)>
". (I've used brackets to show what each [\S]+
matches.)
Change this:
/<[\S]+><\/[\S]+>/
To this:
/<[^/>][^>]*><\/[^>]+>/
Overall you should really be using a proper HTML processor, but if you're munging HTML soup this should suffice :)
Upvotes: 25
Reputation: 8732
Here's a modern native JavaScript solution; which is actually quite similar to the jQuery one from 2010. I adapted it from that answer for a project that I am working on, and thought I would share it here.
document.querySelectorAll("*:empty").forEach((x)=>{x.remove()});
document.querySelectorAll
returns a NodeList
; which is essentially an array of all DOM nodes which match the CSS selector given to it as an argument.
*:empty
is a selector which selects all elements (*
means "any element") that is empty (which is what :empty
means).
This will select any empty element within the entire document, if you only wanted to remove any empty elements from within a certain part of the page (i.e. only those within some div
element); you can add an id to that element and then use the selector #id *:empty
, which means any empty element within the element with an id of id
.
This is almost certainly what you want. Technically some important tags (e.g. <meta>
tags, <br>
tags, <img>
tags, etc) are "empty"; so without specifying a scope, you will end up deleting some tags you probably care about.
forEach
loops through every element in the resulting NodeList
, and runs the anonymous function (x)=>{x.remove()}
on it. x
is the current element in the list, and calling .remove()
on it removes that element from the DOM.
Hopefully this helps someone. It's amazing to see how far JavaScript has come in just 8 years; from almost always needing a library to write something complex like this in a concise manner to being able to do so natively.
So, the method detailed above will work fine in most circumstances, but it has two issues:
<div> </div>
are not treated as :empty
(not the space in-between). CSS Level 4 selectors fix this with the introduction of the :blank
selector (which is like empty except it ignores whitespace), but currently only Firefox supports it (in vendor-prefixed form).:empty
- and this will remain the case with :blank
, too.I have written a slightly larger function which deals with these two use cases:
document.querySelectorAll("*").forEach((x)=>{
let tagName = "</" + x.tagName + ">";
if (x.outerHTML.slice(tagName.length).toUpperCase() == tagName
&& /[^\s]/.test(x.innerHTML)) {
x.remove();
}
});
We iterate through every element on the page. We grab that element's tag name (for example, if the element is a div this would be DIV
, and use it to construct a closing tag - e.g. </DIV>
.
That tag is 6 characters long. We check if the upper-cased last 6 characters of the elements HTML matches that. If it does we continue. If it doesn't, the element does't have a closing tag, and therefore must be self-closing. This is preferable over a list, because it means you don't have to update anything should a new self-closing tag get added to the spec.
Then, we check if the contents of the element contain any whitespace. /[^\s]/
is a RegEx. []
is a set in RegEx, and will match any character that appears inside it. If ^
is the first element, the set becomes negated - it will match any element that is NOT in the set. \s
means whitespace - tabs, spaces, line breaks. So what [^\s]
says is "any character that is not white space".
Matching against that, if the tag is not self-closing, and its contents contain a non-whitespace character, then we remove it.
Of course, this is a bit bigger and less elegant than the previous one-liner. But it should work for essentially every case.
Upvotes: 2
Reputation: 189
found this on code pen: jQuery though but does the job
$('element').each(function() {
if ($(this).text() === '') {
$(this).remove();
}
});
You will need to alter the element to point to where you want to remove empty tags. Do not point at document cause it will result in my answer at Toastrackenigma
Upvotes: 0
Reputation: 1
You can use this one
text = text.replace(/<[^/>][^>]>\s</[^>]+>/gim, "");
Upvotes: 0
Reputation: 2176
All the answers with regex are only validate
<label></label>
but in the case of
<label> </label>
<label> </label>
<label>
</label>
try this pattern to get all the above
<[^/>]+>[ \n\r\t]*</[^>]+>
Upvotes: 9
Reputation: 5543
I like MattMitchell's jQuery solution but here is another option using native JavaScript.
function CleanChildren(elem)
{
var children = elem.childNodes;
var len = elem.childNodes.length;
for (var i = 0; i < len; i++)
{
var child = children[i];
if(child.hasChildNodes())
CleanChildren(child);
else
elem.removeChildNode(child);
}
}
Upvotes: 2
Reputation: 41823
Regex is not for HTML. If you're in JavaScript anyway I'd be encouraged to use jQuery DOM processing.
Something like:
$('*:empty').remove();
Alternatively:
$("*").filter(function()
{
return $.trim($(this).html()).length > 0;
}).remove();
Upvotes: 23
Reputation: 18350
This is an issue of greedy regex. Try this:
str=str.replace(/<[\^>]+><\/[\S]+>/gim, "");
or
str=str.replace(/<[\S]+?><\/[\S]+>/gim, "");
In your regex, <[\S]+?>
matches <i>italic</i>
and the <\/[\S]+>
matches the </p>
Upvotes: 1
Reputation: 881735
You need /<[\S]+?><\/[\S]+?>/
-- the difference is the ?
s after the +
s, to match "as few as possible" (AKA "non-greedy match") nonspace characters (though 1 or more), instead of the bare +
s which match"as many as possible" (AKA "greedy match").
Avoiding regular expressions altogether, as the other answer recommends, is also an excellent idea, but I wanted to point out the important greedy vs non-greedy distinction, which will serve you well in a huge variety of situations where regexes are warranted.
Upvotes: 3