PyroMoto
PyroMoto

Reputation: 29

How to get text content of the entire document excluding HTML?

So I am trying to get all of the text from the current page. I am using $('body').text() but it doesn't seem to work as I would like it to work. When I do that, it returns some javascript too. I only want the visible text to be searched. Is there anyway to do this?

Upvotes: 0

Views: 211

Answers (1)

Heretic Monkey
Heretic Monkey

Reputation: 12113

The following will get you what you want. However, there are caveats.

console.log(jQuery('body *:not(script,style,noscript)').text());
<p>Needs me some text</p>
<style>
noscript { font-weight: bold; }
</style>
<noscript>
<div>whatever dude, I don't script anyway</div>
</noscript>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>

See that *? That means every element that is a descendant of body. So the :not() is applied to to every element on the page, checking if it's script, style or noscript. If you're lucky, your browser supports this natively, so it does this with some relatively fast code, because depending on the size of your page and the number of elements within it, it could take a considerable amount of time to make that check.

Upvotes: 1

Related Questions