aastha
aastha

Reputation: 45

Scrape table in R using rvest

I'm unable to scrape the table in the link mentioned below, i've inspected the source code and noted that the table has class name : tablesaw-sortable

I tested the method below on a wikipedia page and it's able to extract the table, any way to read the particular table?

url <- read_html("https://www.wunderground.com/history/airport/KNYC/2015/01/01/DailyHistory.html?HideSpecis=0")

weather_hourly <- url %>% 
  html_nodes(xpath='//*[@class="tablesaw-sortable"]') %>% 
  html_table()

Upvotes: 0

Views: 477

Answers (1)

ASH
ASH

Reputation: 20362

Ok, something like this should get you pretty close to where you want to be.

library("httr")
URL <- "https://www.timeanddate.com/weather/usa/new-york/historic?month=8&year=2018"
temp <- tempfile(fileext = ".html")
GET(url = URL, user_agent("Mozilla/5.0"), write_disk(temp))

library("XML")
df <- readHTMLTable(temp)
df <- df[[2]]

df

Create a small loop if you want to iterate through a bunch of URLs and import data from each.

Upvotes: 1

Related Questions