scraping an interactive table in R with rvest

Question

I'm trying to scrape the scrolling table from the following link: http://proximityone.com/cd114_2013_2014.htm

I'm using rvest but am having trouble finding the correct xpath for the table. My current code is as follows:

url <- "http://proximityone.com/cd114_2013_2014.htm" 
table <- gis_data_html %>%
html_node(xpath = '//span') %>%
html_table()

Currently I get the error "no applicable method for 'html_table' applied to an object of class "xml_missing""

Anyone know what I would need to change to scrape the interactive table in the link?

Mark · Accepted Answer

So the problem you're facing is that rvest will read the source of a page, but it won't execute the javascript on the page. When I inspect the interactive table, I see

but when I look at the page source, "aw52-box-focus" doesn't exist. This is because it's created as the page loads via javascript.

You have a couple of options to deal with this. The 'easy' one is to use RSelenium and use an actual browser to load the page and then get the element after it's loaded. The other options is to read through the javascript and see where it's getting the data from and then tap into that rather than scraping the table.

UPDATE

Turns out it's really easy to read the javascript - it's just loading a CSV file. The address is in plain text, http://proximityone.com/countytrends/cd114_acs2014utf8_hl.csv

The .csv doesn't have column headers, but those are in the

scraping an interactive table in R with rvest

Answers (1)

Related Questions