user1486507
user1486507

Reputation: 709

R, right xpath to grab the text using xpathSApply

It's a simple xpath exercise, but I cannot get it work.

When I inspect the element of a button (using google chrom), it gives this tree - I'd like to grab the title, such as "Distinguished Contributor" or "Board Manager".

<span class="author-by"></span>

<span class="UserName lia-user-name">

    <img id="display_3" class="lia-user-rank-icon-left" alt="Distinguished Contributor" title="Distinguished Contributor"></img>

.....

<span class="author-by"></span>

<span class="UserName lia-user-name">

    <img id="display_25" class="lia-user-rank-icon-left" alt="Board Manager" title="Board Manager"></img>

So far, I tried

> xpathSApply(htmltree, "//img[@class='lia-user-rank-icon-left']", xmlGetAttr, "href")

> test = "//img/@title"
> a <- xpathSApply(htmltree, test, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

and a bunch of others, but it wasn't successful yet. Any guidance will be very appreciated!

Upvotes: 1

Views: 713

Answers (1)

Randy Lai
Randy Lai

Reputation: 3174

It is a example for getting the source of images with class 'dno'. I think in your case, you have to change 'dno' and 'src'.

library(RCurl)
library(XML)
text = getURL("http://stackoverflow.com/questions/23024062/r-right-xpath-to-grab-the-text-using-xpathsapply")
d = htmlParse(text)
L = xpathApply(d, "//img[@class='dno']")
sapply(L, xmlGetAttr, "src")

You can replace the last two lines by xpathApply(d, "//img[@class='dno']", xmlGetAttr, "src"). However, for debugging purposes, it is better to split it into two commands.

Upvotes: 2

Related Questions