Reputation: 4765
How can I select all visible renderable HTML text nodes in a browser document?
In other words, how can I get a list of DOM nodes I can traverse via scripting in order to obtain the text that is actually visible to the user in the browser, in document order?
I would like to rely on the browser to tell me the nodes that constitute currently visible renderable text. I'm not sure where to start. Help?
Upvotes: 1
Views: 123
Reputation: 35670
This is tricky, but here's what I've come up with:
function traverse(o) {
var a = [];
[].forEach.call(o.childNodes, function(val) {
if(val.nodeType===3) {
if(val.nodeValue.trim()>'') a.push(val);
}
else {
var style= getComputedStyle(val);
if(val.tagName!=='NOSCRIPT' &&
style.getPropertyValue('display')!=='none' &&
style.getPropertyValue('visibility')!=='hidden' &&
style.getPropertyValue('opacity')!=='0' &&
style.getPropertyValue('color')!==style.getPropertyValue('background-color')
) {
a= a.concat(traverse(val));
}
}
});
return a;
} //traverse
var textNodes= traverse(document.body);
This does not check if text nodes are hidden behind other elements or if they are absolutely positioned offscreen.
Upvotes: 2
Reputation: 10924
You should be able to do this in 1 line of JavaScript:
document.querySelector("body").innerText
Upvotes: 1