Jimmy Page
Jimmy Page

Reputation: 343

A regex to remove id, style, class attributes from HTML tags in JS

I got a html String in javascript and using regex I want to remove id, style and class attributes in html tags, for example I have:

New York City.<div style="padding:20px" id="upp" class="upper"><div style="background:#F2F2F2; color:black; font-size:90%; padding:10px 10px; width:500px;">This message is.</div></div>

I want this String to become:

New York City.<div><div>This message is.</div></div>

Upvotes: 5

Views: 14360

Answers (8)

Use regular expression. That is fast (in production time) and easy (in development time).

htmlCode = htmlCode.replace(/<([^ >]+)[^>]*>/ig,'<$1>');

Upvotes: 1

Devplex
Devplex

Reputation: 257

I don't know about RegEx, but I sure as hell know about jQuery.

Convert the given HTML string into a DOM element, parse it, and return its contents.

function cleanStyles(html){
    var temp = $(document.createElement('div'));
        temp.html(html);

        temp.find('*').removeAttr('style');
        return temp.html();
}

Upvotes: 0

Cem Yıldız
Cem Yıldız

Reputation: 142

i used this

var html = 'New York City.<div style="padding:20px" id="upp"
class="upper"><div style="background:#F2F2F2; color:black; font-size:90%; padding:10px 10px; width:500px;">This message is.</div></div>';

function clear_attr(str,attrs){
    var reg2 = /\s*(\w+)=\"[^\"]+\"/gm;
    var reg = /<\s*(\w+).*?>/gm;
    str = str.replace(reg,function(match, i) {
        var r_ = match.replace(reg2,function(match_, i) {
            var reg2_ = /\s*(\w+)=\"[^\"]+\"/gm;
            var m = reg2_.exec(match_);
            if(m!=null){
                if(attrs.indexOf(m[1])>=0){
                    return match_;
                }
            }
            return '';
        });        
        return r_;
    });
    return str;
}
clear_attr(html,[]);

Upvotes: 1

You
You

Reputation: 23774

Instead of parsing the HTML using regular expressions, which is a bad idea, you could take advantage of the DOM functionality that is available in all browsers. We need to be able to walk the DOM tree first:

var walk_the_DOM = function walk(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walk(node, func);
        node = node.nextSibling;
    }
};

Now parse the string and manipulate the DOM:

var wrapper= document.createElement('div');
wrapper.innerHTML= '<!-- your HTML here -->';
walk_the_DOM(wrapper.firstChild, function(element) {
    if(element.removeAttribute) {
        element.removeAttribute('id');
        element.removeAttribute('style');
        element.removeAttribute('class');
    }
});
result = wrapper.innerHTML;

See also this JSFiddle.

Upvotes: 11

David Thomas
David Thomas

Reputation: 253318

If you just want to remove the attributes, then regex is the wrong tool. I'd suggest, instead:

function stripAttributes(elem){
    if (!elem) {
        return false;
    }
    else {
        var attrs = elem.attributes;
        while (attrs.length) {
            elem.removeAttribute(attrs[0].name);
        }
    }
}

var div = document.getElementById('test');

stripAttributes(div);

JS Fiddle demo.

Upvotes: 1

kennebec
kennebec

Reputation: 104770

If you are willing to remove everything but the div tag names-

string=string.replace(/<(div)[^>]+>/ig,'<$1>');

This will return <DIV> if the html is upper Case.

Upvotes: 3

RobG
RobG

Reputation: 147373

A plain script solution would be something like:

function removeProperties(markup) {
  var div = document.createElement('div');
  div.innerHTML = markup;
  var el, els = div.getElementsByTagName('*');

  for (var i=0, iLen=els.length; i<iLen; i++) {
    el = els[i];
    el.id = '';
    el.style = '';
    el.className = '';
  }
  // now add elements to the DOM
  while (div.firstChild) {
   // someElement.appendChild(div.firstChild);
  }
}

A more general solution would get the property names as extra arguments, or say a space separated string, then iterate over the names to remove them.

Upvotes: 0

Elias Zamaria
Elias Zamaria

Reputation: 101073

Trying to parse HTML with regexes will cause problems. This answer may be helpful in explaining them. If you are using jQuery, you may be able to do something like this:

var transformedHtml = $(html).find("*").removeAttr("id").removeAttr("style").removeAttr("class").outerHTML()

For this to work, you need to be using the outerHTML plugin described here.

If you don't want to use jQuery, it will be trickier. These question may have some helpful answers as to how to convert the string to a collection of DOM elements: Converting HTML string into DOM elements?, Creating a new DOM element from an HTML string using built-in DOM methods or prototype. You may be able to loop through the elements and remove the attributes using the built-in removeAttr function. I don't have the time or motivation to figure out all the details for you.

Upvotes: 0

Related Questions