Reputation: 709
It's a simple xpath exercise, but I cannot get it work.
When I inspect the element of a button (using google chrom), it gives this tree - I'd like to grab the title, such as "Distinguished Contributor" or "Board Manager".
<span class="author-by"></span>
<span class="UserName lia-user-name">
<img id="display_3" class="lia-user-rank-icon-left" alt="Distinguished Contributor" title="Distinguished Contributor"></img>
.....
<span class="author-by"></span>
<span class="UserName lia-user-name">
<img id="display_25" class="lia-user-rank-icon-left" alt="Board Manager" title="Board Manager"></img>
So far, I tried
> xpathSApply(htmltree, "//img[@class='lia-user-rank-icon-left']", xmlGetAttr, "href")
> test = "//img/@title"
> a <- xpathSApply(htmltree, test, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))
and a bunch of others, but it wasn't successful yet. Any guidance will be very appreciated!
Upvotes: 1
Views: 713
Reputation: 3174
It is a example for getting the source of images with class 'dno'. I think in your case, you have to change 'dno' and 'src'.
library(RCurl)
library(XML)
text = getURL("http://stackoverflow.com/questions/23024062/r-right-xpath-to-grab-the-text-using-xpathsapply")
d = htmlParse(text)
L = xpathApply(d, "//img[@class='dno']")
sapply(L, xmlGetAttr, "src")
You can replace the last two lines by xpathApply(d, "//img[@class='dno']", xmlGetAttr, "src")
. However, for debugging purposes, it is better to split it into two commands.
Upvotes: 2