Martin AJ
Martin AJ

Reputation: 6697

How can I strip only HTML tags, not their contents?

I'm trying to remove the HTML tags that have at least one attribute from string. But I need to keep their contents. So suppose this string:

<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
    <div style="color: red">and <p>also</p> this</div>
    <div style="color: red">and this <div style="color: red">too</div></div>
</div>

I want this output:

<div>
    <p>These line shall stay</p>
    Remove this one
    <p>But keep this</p>
    and this
    and <p>also</p> this
    and this too
</div>

How can I do that?


Actually I can do that by PHP:

$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//*[@*]") as $node) {
    $parent = $node->parentNode;
    while ($node->hasChildNodes()) {
        $parent->insertBefore($node->lastChild, $node->nextSibling);
    }
    $parent->removeChild($node);
}

echo $dom->saveHTML();

As you see it works as well, but now I need to do that by javascript (or jQuery). Well how can I do that? Here is what I've tried so far:

$('.myTextArea *').each(function(){
    if (this.attributes.length)
        $(this).remove();
});

Upvotes: 0

Views: 69

Answers (3)

trincot
trincot

Reputation: 349946

You could do it with this function, which follows almost the same logic as your PHP code:

function cleanHtml(html) {
    var $doc = $('<span>' + html + '</span>');
    $('*', $doc).each(function (index, el) {
        if (!$(el).parents().addBack().is('pre') &&
                el.hasAttributes('attributes')) {
            while ($(el).contents().length) {
                $(el).contents().last().insertAfter(el);
            }
            $(el).remove();
        }
    });
    return $doc.html();
}

// I/O for snippet
$('button').click (function () {
    // get HTML from input textarea
    var dirtyHtml = $('.myTextArea').val();
    // clean it
    var html = cleanHtml(dirtyHtml);
    // put cleaned HTML back in textarea
    $('.myTextArea').val(html);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea class="myTextArea" style="width:100%" rows=10>  
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p>But keep this</p>
    <div style="color: red">and this</div>
    <pre>do not touch <div class="myclass">this div in code</div></pre>
    <div style="color: red">and <p>also</p> this</div>
    <div style="color: red">and this <div style="color: red">too</div></div>
</div>
</textarea>

<button>Clean</button>

Upvotes: 2

Derek Story
Derek Story

Reputation: 9583

You could get the string and replace the entire element with it:

JS Fiddle

$('.myTextArea *').each(function(){
    if(this.attributes.length) {
      var string = $(this).text();
      $(this).replaceWith(string)
    }
});

Upvotes: 1

Dario
Dario

Reputation: 6270

This should work:

$('.myTextArea *').each(function() {
    while(this.attributes.length > 0) {
       this.removeAttribute(this.attributes[0].name);
    }
});

It loops through all the attribute and remove them one by one.

Upvotes: 0

Related Questions