Reputation: 9184
For example i have such html:
<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper">< div class="modal-content"><div class="modal-body full long"><div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>
how could i remove all style class id etc from such html?
i have such regex:
/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i
what is wrong? how to delete all html attributes with the help of regex?
here is fiddle:
http://jsfiddle.net/qL4maxn0/1/
Upvotes: 1
Views: 5990
Reputation: 41852
You should not use regex here.
var html = '<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper"><div class="modal-content"><div class="modal-body full long"> <div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li> </ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div><div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>';
var div = document.createElement('div');
div.innerHTML = html;
function removeAllAttrs(element) {
for (var i = element.attributes.length; i-- > 0;)
element.removeAttributeNode(element.attributes[i]);
}
function removeAttributes(el) {
var children = el.children;
for (var i = 0; i < children.length; i++) {
var child = children[i];
removeAllAttrs(child);
if (child.children.length) {
removeAttributes(child);
}
}
}
removeAttributes(div);
console.log(div.innerHTML);
Upvotes: 4
Reputation: 477684
First of all, I would advise you not to use regexes in this situation, they are not meant to parse tree-shaped structures like HTML.
If you however don't have a choice, I think for the requested problem, you can use a regex.
Looks to me like you forgot spaces, accents, etc. You can use the fact that the greater than >
and less than <
signs are not allowed as raw text.
/<\s*([a-z][a-z0-9]*)\s.*?>/gi
and call it with:
result = body.replace(regex, '<$1>')
For your given sample, it produces:
<title>Ololo - text’s life</title><div><div><div><div><div><div><div>olololo<ul><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p>bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div></div></div>
Upvotes: 6
Reputation: 120586
You're missing the g
flag to make the replace global.
/<([a-z][a-z0-9]*)[^>]*?(\/?)>/ig
Also, if you're doing this for security purposes, look into using a proper HTML sanitizer : Sanitize/Rewrite HTML on the Client Side
Upvotes: 1