Reputation: 31
I want to get the TEXT ONLY from the following HTML document without the contents of the <script>
tag?
<html>
<body>
<script>
a = 0;
</script>
<div>TEST</div>
<p>test</p>
</body>
</html>
I have the following code:
$('body').text()
This currently gets the result:
a = 0; TEST test
But I am trying to get the result:
TEST test
Upvotes: 3
Views: 1946
Reputation: 48415
First of all, you can get all the 'none script' elements with the following code:
var elements = $('#body').children().not('script');
Now you could just do the following to get all the text:
var text = elements.text();
However, this will result in no spaces between text nodes, i.e. TESTtest
. If this is what you want then great, stop here.
But if you want the spaces, you can loop the elements and build a string:
var text = "";
elements.each(function(){
text += $(this).text() + " ";
});
text = text.trim();
Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.
Upvotes: 1
Reputation: 1308
This is probably not a perfect solution, but should be good enough for simple html pages:
$('<div>').html($('body').html()).find('script').remove().end().text()
Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.
Upvotes: 1
Reputation: 157334
Ok, so as you edited your question. If you are looking to extract the text from the page but not script
tags, you can write something like
let cloneBody = $('body').clone().find('script').remove().end();
console.log(cloneBody.text().trim());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>
Upvotes: 3
Reputation: 1620
You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression
function stripScripts(s) {
var div = document.createElement('div');
div.innerHTML = s;
var scripts = div.getElementsByTagName('script');
var i = scripts.length;
while (i--) {
scripts[i].parentNode.removeChild(scripts[i]);
}
return div.innerHTML;
}
alert(
stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);
Upvotes: 2