Reputation: 8042
My goal is to get particular text area from web page. Imagine it as if you were able to draw a rectangle anywhere on a page and everything in this rectangle would be copied in your clipboard. I am using FireBug (feel free to suggest another solutions, I have searched for plugin or bookmarklets but did not find anything usefull) with it's console window and XPath for this purpose. The values which I want obtain are in following format (this was observed from FireBug "HTML inspect"):
<span class="number3_0" title="Numbers">3.00</span>
so I end up with following code, which I issue from FireBug console:
$x("//span[@title='Numbers']/text()")
After this I get something like this:
[<TextNode textContent="2.00">, <TextNode textContent="2.00">, <TextNode textContent="2.00">, <TextNode textContent="2.00">, <TextNode textContent="3.00">]
After this I click (with right mouse button) on [
and select Inspect in DOM panel
then I press ctrl+a
and copy/paste the data in following format:
0 <TextNode textContent="2.00">
1 <TextNode textContent="2.00">
2 <TextNode textContent="2.00">
3 <TextNode textContent="2.00">
4 <TextNode textContent="3.00">
As you can assume the value of textContent
is the information that I am interested in. I have tried to modify original XPath query to return me only this numbers but no luck. I was:
wrapping whole query into string()
as suggested here Xpath - get only node content without other elements
trying to figure out how this one is working Extracting text in between nodes through XPath and lot of more.
To be able to obtain desired values I used some bash-scripting + xml-formatting, after this tedious/error-prone task I get following format:
<?xml version="1.0"?>
<head>
<TextNode textContent="2.00"/>
<TextNode textContent="2.00"/>
<TextNode textContent="2.00"/>
<TextNode textContent="2.00"/>
<TextNode textContent="3.00"/>
<TextNode textContent="3.00"/>
</head>
Now I use xmlstarlet
to obtain those values (yes I know that I can use regexp in previous step and have all data that I need. But I am interesting in DOM/XPath parsing and trying to figure out how it is working) in following way:
cat input | xmlstarlet sel -t -m "//TextNode" -v 'concat(@textContent,"
")'
This finnaly gives me the desired output:
2.00
2.00
2.00
2.00
3.00
My questions are a bit generic:
$x("//span[@title='Numbers']/text()")
to immediatelly get only
numbers and save myself rest of steps?I am still not very familiar with xmlstarlet
, especially selection
(sel
) mode drives me crazy. I have seen various combinations of
following options:
-c or --copy-of - print copy of XPATH expression
-v or --value-of - print value of XPATH expression
-o or --output - output string literal
-m or --match - match XPATH expression
can somebody please explain when to use which one? It would be glad to see in particular examples if is possible. In case of interest there are various combinations of mentioned options, that I do not understand well: http://www.grahl.ch/blog/minutiae-return-content-element-xmlstarlet Extracting and dumping elements using xmlstarlet Testing for an XML attribute
4.) The last question regarding xmlstarlet
is a bit cosmetic syntactical sugar, how to obtain nice newline separated output, as you can see I 'cheat' with adding newline as a separator but when I tried it with escape character like this:
cat input | xmlstarlet sel -t -m "//TextNode" -v 'concat(@textContent,"\n")'
it did not worked, also the original reference from where I learn a lot used it in this 'ugly' way http://www.ibm.com/developerworks/library/x-starlet/index.html
PS: maybe those all steps could be simplified with curl + xmlstarlet but it could be handy to have also FireBug option for pages which requires login or such other stuff.
Thanks for all idea.
Upvotes: 0
Views: 3346
Reputation: 1646
$$("<CSS3 selector>")
and $x("<XPATH>")
in Firebug actually return a real Array (not like the results of document.querySelectorAll() or document.evaluate). So they are more convenient.
With Firefox + Firebug:
var numbersNode = $x("//span[@title='Numbers']/text()");
var numbersText = numbersNode.map(function(numberNode) {
return numberNode.textContent;
}).join("\n");
// Special command of Firebug to copy text into clipboard:
copy(numbersText);
You can even do with a compact way using arrow functions of the EcmaScript 6:
copy($x("//span[@title='Numbers']/text()").map(x => x.textContent).join("\n"));
The same if you chose $$('span[title="Numbers"]')
as suggested William Narmontas.
Florent
Upvotes: 1
Reputation: 741
From what I gather you want to collect numbers from spans that have a title 'Numbers' and want it as a string.
Try the following:
var numberNodes = document.querySelectorAll('span[title="Numbers"]')
function giveText(me) { return me.textContent; }
Array.prototype.map.call(numberNodes, giveText).join("\n");
The first line selects all nodes using CSS query selectors in the document (meaning you do not need XPath).
The second line creates a function that returns the text content of a node.
The third line maps the elements from the numberNodes
list using the giveText
function, produces an array of numbers, and then finally joins them with a newline.
After this you might not need this xmlstarlet.
Upvotes: 2