Reputation: 121568
I tried to get the list of selectors in this page :
$("#Lastname"),$(".intro"),....
Here my attempt using xpathSApply
:
library(XML)
library(RCurl)
a <- getURL('http://www.w3schools.com/jquery/trysel.asp')
doc <- htmlParse(a)
xpathSApply(doc,'//*[@id="selectorOptions"]') ## I can't get the right xpath
I tried also but without success:
xpathSApply(doc,'//*[@id="selectorOptions"]/div[i]')
EDIT I add python tag since I accept a python solution also.
Upvotes: 3
Views: 406
Reputation: 30425
The following is an R way to get at javascript pages like this. You will need to use a browser as noted by @Peyton. Selenium server is one good way to control a browser. I have written some bindings for R for selenium server at https://github.com/johndharrison/RSelenium
The following would allow one to access the post javascript source:
require(devtools)
devtools::install_github("RSelenium", "johndharrison")
library(RSelenium)
library(RJSONIO)
# one needs to have an active server running
# the following commented out lines source the latest java binary
# RSelenium::checkForServer()
# RSelenium::startServer()
# a selenium server is assummed to be running now
remDR <- remoteDriver$new()
remDR$open() # opens a browser usually firefox with default settings
remDR$navigate('http://www.w3schools.com/jquery/trysel.asp') # navigate to your page
webElem <- remDR$findElements(value = "//*[@id='selectorOptions']") # find your elememts
# display the appropriate quantities
cat(fromJSON(webElem[[1]]$getElementText())$value)
> cat(fromJSON(webElem[[1]]$getElementText())$value)
$("#Lastname")
$(".intro")
$(".intro, #Lastname")
$("h1")
$("h1, p")
$("p:first")
$("p:last")
$("tr:even")
$("tr:odd")
$("p:first-child")
$("p:first-of-type")
$("p:last-child")
$("p:last-of-typ
.....................
UPDATE:
A much easier way to access the information in this case is to use the executeScript
method
library(RSelenium)
RSelenium:startServer()
remDr$open()
remDR$navigate('http://www.w3schools.com/jquery/trysel.asp')
remDr$executeScript("return w3Sels;")[[1]]
> remDr$executeScript("return w3Sels;")[[1]]
[1] "#Lastname" ".intro"
[3] ".intro, #Lastname" "h1"
[5] "h1, p" "p:first"
[7] "p:last" "tr:even"
[9] "tr:odd" "p:first-child"
[11] "p:first-of-type" "p:last-child"
[13] "p:last-of-type" "li:nth-child(1)"
[15] "li:nth-last-child(1)" "li:nth-of-type(2)"
[17] "li:nth-last-of-type(2)" "b:only-child"
[19] "h3:only-of-type" "div > p"
[21] "div p" "ul + h3"
[23] "ul ~ table" "ul li:eq(0)"
[25] "ul li:gt(0)" "ul li:lt(2)"
[27] ":header" ":header:not(h1)"
[29] ":animated" ":focus"
[31] ":contains(Duck)" "div:has(p)"
[33] ":empty" ":parent"
[35] "p:hidden" "table:visible"
[37] ":root" "p:lang(it)"
[39] "[id]" "[id=my-Address]"
[41] "p[id!=my-Address]" "[id$=ess]"
[43] "[id|=my]" "[id^=L]"
[45] "[title~=beautiful]" "[id*=s]"
[47] ":input" ":text"
[49] ":password" ":radio"
[51] ":checkbox" ":submit"
[53] ":reset" ":button"
[55] ":image" ":file"
[57] ":enabled" ":disabled"
[59] ":selected" ":checked"
[61] "*"
Upvotes: 4
Reputation: 121568
Thanks to jdharrison comment I parsed the javascript code to extract all selectors. As mentioned by Peyton this works in this particular case since all the selectors are in code.
capture.output(xpathSApply(doc,'//*/script')[[6]],
file='test.js')
ll <- readLines('test.js')
ll <- ll[grepl('w3Sels.push',ll)]
ll <- unlist(regmatches(ll, gregexpr("(?<=\\().*?(?=\\))", ll, perl=T)))
cat(head(ll))
"#Lastname" ".intro" ".intro, #Lastname" "h1" "h1, p" "p:first"
Upvotes: 0