NBE
NBE

Reputation: 651

Using rvest or RSelenium to create automated webscrape of table inside frame

I know there's a great deal of resources/questions that deal with this subject but I have been trying for days and can't seem to figure it out. I have webscraped websites before but this one is causing me problems.

The website: njaqinow.net

What I want scraped: I would like to scrape the table under the "Current Status"->"Pollutants" tab. I would like to have this scraped every time the table is updated so I can use this information inside a shiny app I am creating.

enter image description here

What I have tried: I have tried numerous different approaches but for simplicity I will show my most recent approach:

    library("rvest")
url<-"http://www.njaqinow.net"
webpage <- read_html(url)

test<-webpage%>%
  html_node("table")%>%
  html_table()

My guess is that this is way more complicated then I originally thought because it seems to me that the table is inside a frame. I am not a javascript/HTML pro so I am not entirely sure. Any help/guidance would be greatly appreciated!

Upvotes: 1

Views: 918

Answers (1)

Tonio Liebrand
Tonio Liebrand

Reputation: 17699

I can contribute a solution with RSelenium. I would show you how to navigate to that table and get its content. For formatting the table content i provide a link to another question, but wont be in the scope of this answer.

I think you have two challenges. Switch into a frame and switching between frames. Switch into a frame is done by remDr$switchToFrame().

Switching between frames is discussed here: https://github.com/ropensci/RSelenium/issues/155. In your case:

remDr$switchToFrame("contents")
...
remDr$switchToFrame(NA)
remDr$switchToFrame("contentsi")

Full code would read:

remDr$navigate("http://www.njaqinow.net")
frame1 <- remDr$findElement("xpath", "//frame[@id = 'contents']")
remDr$switchToFrame(frame1)
remDr$findElement("xpath", "//*[text() = 'Current Status']")$clickElement()
remDr$findElement("xpath", "//*[text() = 'POLLUTANTS']")$clickElement()

remDr$switchToFrame(NA)
remDr$switchToFrame("contentsi")
table <- remDr$findElement("xpath", "//table[@id = 'C1WebGrid1']")
table$getElementText()

For formatting a table you could look here: scraping table with R using RSelenium

Upvotes: 2

Related Questions