Caleb Park
Caleb Park

Reputation: 31

Get text from DOM excluding script tags

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test

Upvotes: 3

Views: 1946

Answers (4)

musefan
musefan

Reputation: 48415

First of all, you can get all the 'none script' elements with the following code:

var elements = $('#body').children().not('script');

Now you could just do the following to get all the text:

var text = elements.text();

However, this will result in no spaces between text nodes, i.e. TESTtest. If this is what you want then great, stop here.

But if you want the spaces, you can loop the elements and build a string:

var text = "";
elements.each(function(){
    text += $(this).text() + " ";
});
text = text.trim();

Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.

Upvotes: 1

Orr Siloni
Orr Siloni

Reputation: 1308

This is probably not a perfect solution, but should be good enough for simple html pages:

$('<div>').html($('body').html()).find('script').remove().end().text()

Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.

Upvotes: 1

Mr. Alien
Mr. Alien

Reputation: 157334

Ok, so as you edited your question. If you are looking to extract the text from the page but not script tags, you can write something like

let cloneBody = $('body').clone().find('script').remove().end();
                
console.log(cloneBody.text().trim());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
  var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>

Upvotes: 3

Cagy79
Cagy79

Reputation: 1620

You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression

function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);

Upvotes: 2

Related Questions