Reputation: 45
I'm unable to scrape the table in the link mentioned below, i've inspected the source code and noted that the table has class name : tablesaw-sortable
I tested the method below on a wikipedia page and it's able to extract the table, any way to read the particular table?
url <- read_html("https://www.wunderground.com/history/airport/KNYC/2015/01/01/DailyHistory.html?HideSpecis=0")
weather_hourly <- url %>%
html_nodes(xpath='//*[@class="tablesaw-sortable"]') %>%
html_table()
Upvotes: 0
Views: 477
Reputation: 20362
Ok, something like this should get you pretty close to where you want to be.
library("httr")
URL <- "https://www.timeanddate.com/weather/usa/new-york/historic?month=8&year=2018"
temp <- tempfile(fileext = ".html")
GET(url = URL, user_agent("Mozilla/5.0"), write_disk(temp))
library("XML")
df <- readHTMLTable(temp)
df <- df[[2]]
df
Create a small loop if you want to iterate through a bunch of URLs and import data from each.
Upvotes: 1