Martin AJ
Martin AJ

Reputation: 6707

How can I strip HTML tags that have attribute(s) from string?

I have a question and answer website like SO. Also I have a textarea and a preview under it (exactly the same as SO). I use a markdown library to converts some symbols to HTML tags. For example that JS library replaces ** with <b>. Ok all fine.

Now I need to escape HTML tags that have attribute. I can do that by PHP like this:

<?php

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
</div>
DATA;

$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);

$xpath = new DOMXPath($dom);

$lines_to_be_removed = $xpath->query("//*[count(@*)>0]");

foreach ($lines_to_be_removed as $line) {
    $line->parentNode->removeChild($line);
}

// just to check
echo $dom->saveHtml($dom->documentElement);
?>

I'm not sure code above is the best, but as you see (in the fiddle I've linked) it works as expected. I mean it removes nodes that are at least one attribute. Now I need to do that by JS (or jQuery) (I need this for that textarea preview simulator). Anyway how can I do that? Do I need regex?

Upvotes: 1

Views: 764

Answers (2)

Brian Peacock
Brian Peacock

Reputation: 1849

The JavaScript element.attributes property returns a live NamedNodeMap of a tags attributes and their values. For example...

HTML

<div class=".cls" id="id" title="divtitle">
    <!-- content ... -->
</div>

JavaScript

var div = document.getElementById('id');
var attr = div.attributes;

console.log(attr);
/* => 
NamedNodeMap [class="cls", id="id", title="divtitle"]
*/

This can be used to filter selected items - something like this for your example...

/* return an array from querySelectorAll */
var paras = Array.prototype.slice.call(
       document.querySelectorAll('div p')
);

/* loop through paras */
paras.forEach(function(p) {
    /* 'p' = each element in 'paras' */

    /* get attributes of 'p' */
    var attr = p.attributes;

    /* only check elements with attributes */
    if (attr.length != 0) {

        /* loop through attributes */
        Object.keys(attr).forEach(function(a) {
            /* apply conditional */
            if (attr[a].name === 'class' && attr[a].value === 'myclass' ||
                attr[a].name === 'style' && attr[a].value === 'color: red;') {

                /* remove element ('p') */
                p.parentElement.removeChild(p);
            }
        });
    }
});

Because a NamedNodeMap is a type of Object I used Object.keys(obj) to return an array of keys, and then looped over them to determine the attribute's .name and .value properties.

EDIT: In light of comment above

If you just want to remove the attributes then you can drop the condition above, like so...

paras.forEach(function(p) {
    var attr = p.attributes;
    if (attr.length != 0) {
        Object.keys(attr).forEach(function(a) {
            p.removeAttribute(a);
        });
    }
});

See:

Upvotes: 2

BryanGrezeszak
BryanGrezeszak

Reputation: 577

You could do something like this:

$('.myTextArea *').each(function(){
    if (this.attributes.length)
        $(this).remove();
});

JSFIDDLE

It's not the most efficient, but if it's just a textarea preview it should be fine. I'd recommend running it as little as possible though. As far as I know there is no selector (jQuery or otherwise) that would otherwise do this...so you have to make the JS do the work.


Edit based on comment:

To not remove the element, just the surrounding tag, do something like this:

$('.myTextArea *').each(function(){
    if (this.attributes.length)
        this.outerHTML = this.textContent;
});

JSFIDDLE

Upvotes: 2

Related Questions