Eric Robinson
Eric Robinson

Reputation: 47

Using R to scrape data from a table populated possibly with javascript

Hello fellow R fanatics...

I've been using R to scrape data from a variety of websites for a while now, however this one has me stumped.

I am trying to scrape the data from the following table: http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b=

However my efforts thus far have failed.

I have tried the following

  1. Simple wget, which results in the html from the site, and some of the javascript functions used to populate the table, but I haven't been able to really look through it and find the parts that I could use to grab the data using some of R's JS utilities. It might be that my experience with JS is quite poor
  2. I tried the solution here Reading data from iframe, b/c it looked like the original website had the table in an iframe, but again no luck
  3. A combination of getURL and readHTMLTable

    thisURL = http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b= theURL = getURL(thisURL,.opts = list(ssl.verifypeer = FALSE) ) tables = readHTMLTable(theURL)

This results in an empty table

  1. Spent about an hour going through every part of the html and javascript code I could find, but with limited success as detailed in 1.

It appears maybe R's Selenium package could have a potential solution, but I haven't yet figured out how to use it here, probably due to unfamiliarity

I feel like I'm just missing an essential part here... perhaps due to my lack of knowledge of JS and XML?

UPDATE:

I've noticed that if I right-click on the table element and use Chrome's "inspect" it generates HTML that has all of the table's values in it and would be very scrape-able... I'm still not sure how to get to this point in R though. Anyone have hints on where to look in the "inspect" screen to try and guide my progress?

Upvotes: 1

Views: 250

Answers (1)

Eric Robinson
Eric Robinson

Reputation: 47

The solution to this was the following.

  1. Using the source code, identify the source html for the table
  2. Navigate to the source page, and use Chrome developer tools > Network > XHR
  3. Refresh the page to find the source of the data
  4. Scrape from that source

Thanks to @XR SC for his answer here: web scraping using Chrome Dev Tools for providing the basic approach.

Upvotes: 2

Related Questions