ColinTea
ColinTea

Reputation: 1058

html_nodes giving {xml_nodeset (0)}

I am trying to scrape data from www.speedtest.net/awards/ca/ontario and when I go down some paths the standard functions seems to work, but other paths it doesn't. I'm not sure why.

For example if I go into the header and look for script it works

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
test1<-html_nodes(URL,xpath='/html/head/script[1]')
test1

This will return {xml_nodeset (1)} as expected.

But if I go into the body and try something similar

test2<-html_nodes(URL,xpath='/html/body/script[1]')
test2

I get {xml_nodeset (0)}.

Why can I not get to the nodes that are under body?

I'm trying to use the code below but I've traced my issue back to the problem described above.

real<-html_nodes(URL,xpath='/html/body/div[1]/div[3]/div/div[2]/div/div[3]/div[2]/table')
real

Any ideas?

Upvotes: 4

Views: 11017

Answers (2)

ColinTea
ColinTea

Reputation: 1058

Thanks. Using the css tag search I was able to come up with this which works great to get the table I wanted (the one in the bottom right).

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
table<-html_nodes(URL, "table")
table<-html_table(table)[[2]]

Upvotes: 3

Dave2e
Dave2e

Reputation: 24079

Try this, may not be complete but it should provide a head start in answering your question:

library(rvest)
URL<-read_html("http://www.speedtest.net/awards/ca/ontario")
#find the table rows in the page
table<-html_nodes(URL, "tbody tr")

#pull info from the table rows
num<-html_text(html_nodes(table, "td.u-align-right"))
provider<-html_text(html_nodes(table, "td.cell-provider-name"))

#final data.frame with a table of the results
df<-data.frame(provider, matrix(num, ncol=3, byrow=TRUE))

With rvest I find it easier to search for the css tag as opposed to the xpath.

Upvotes: 1

Related Questions