Reputation: 1276
I've made this to try to extract text.
<script type = "text/javascript">
function extractText(node){
var all = "";
for (node=node.firstChild;node;node=node.nextSibling){
alert(node.nodeValue + " = " + node.nodeType);
if (node.nodeType == 3){
all += node.nodeValue
}
}
alert(all);
}
</script>
That is located in the head of an html document. The body looks as such...
<body onload = "extractText(document.body)">
Stuff
<b>text</b>
<script>
var x = 1;
</script>
</body>
The problem is that the alert(all);
only shows "Stuff", and it adds a bunch of null things that I don't really understand when doing the alert(node.nodeValue + " = " + node.nodeType);
. It says null = 3 a few times. Could anyone tell me why this isn't working properly? Thanks in advance.
Upvotes: 1
Views: 717
Reputation: 101614
If you want the text from the document, you may want to look in to a recursive call. However, if you don't care about children, remove the first if (node.hasChildNodes()){}
condition in the following:
function extractText(node){
var txt = '';
// recursive exploration and option to uncomment the check for a <script>
// <script>s will have children as the the actual portion being executed
// is considered a text node (nodeType===3)
if (node.hasChildNodes()/* && node.nodeName !== 'SCRIPT'*/){
for (var c = 0; c < node.childNodes.length; c++){
txt += extractText(node.childNodes[c]);
}
}else if(node.nodeType===3){
txt += node.textContent;
}
return txt;
}
alert(extractText(document.body));
Also, you probably want to grab textContent
over nodeValue
but that's your call. You can also get more granular and test if the nodeName
is a SCRIPT
and ignore if (if you so chose) but I'll let you make that determination.
Follow-Up: here's a fiddle you can play with, with the <script>
test commented and optional whitespace removal: http://jsfiddle.net/KZuk5/2/
Upvotes: 3
Reputation: 6755
There are different types of nodes - specifically we're looking at two, a text node and an HTML node. A text node is an object and has a property called nodeValue
(that you're accessing properly). However, HTML nodes do not have the nodeValue
property (or rather, it is set to null
).
To get the inner value of an HTML node use .innerHTML
.
Upvotes: 2