S31
S31

Reputation: 934

Webscraping Error: inherited method not found for readHTMLTable in R

I'm currently working on someone else's code, and I've run into an issue I can't seem to pass. When extracting the data from a site for one of the datasets being used, it gives me the error shown below where it can't find an "inherited method" for readHTMLtable.

url = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml"
page <- readLines(url)
Warning message:
In readLines(url) :
  incomplete final line found on 'http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml'
ONI_data_raw <- data.table (readHTMLTable(page, which=8))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’

My goal is just to retrieve a dataset back. I've run another piece of code for another dataset pulled from another website and got just what I am trying to achieve with the one above -

url = "http://www.esrl.noaa.gov/psd/data/correlation/amon.us.data" 
AMO_data_raw <- read.table (url, header = FALSE, skip = 1, nrow = length(count.fields(url))-6)
headers <- c ("Year", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
setnames(AMO_data_raw, headers)
AMO_data_raw = melt(AMO_data_raw, id.vars = "Year")
names(AMO_data_raw)[2] = "Month"
names(AMO_data_raw)[3] = "AMO_Value"
AMO_data_raw$Month = as.numeric(AMO_data_raw$Month)
AMO_data_raw$AMO_Value= as.numeric(AMO_data_raw$AMO_Value)
AMO_data_raw = subset(AMO_data_raw, AMO_data_raw$Year %in% time )
head(AMO_data_raw)
  Year Month AMO_Value
3 1950     1     0.113
4 1951     1     0.105
5 1952     1     0.174
6 1953     1     0.268
7 1954     1     0.229
8 1955     1     0.078

Any advice would be greatly appreciated.

Upvotes: 0

Views: 544

Answers (1)

alistaire
alistaire

Reputation: 43354

If you look at the HTML you get from the first URL, it tells you the site has moved, and gives you a new URL. (If you look at it in a browser, it likely automatically redirects you.) Using the URL it redirects to and rvest for scraping,

library(rvest)

h <- read_html('http://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php')

episodes <- h %>% 
    html_node('table[border="1"]') %>%    # get first table node with a border attribute of "1"
    html_table(header = TRUE, fill = TRUE) %>%    # parse the table
    dplyr::filter(Year != 'Year') %>%    # remove interior header rows
    readr::type_convert()    # convert types from character

str(episodes)
#> 'data.frame':    68 obs. of  13 variables:
#>  $ Year: num  1950 1951 1952 1953 1954 ...
#>  $ DJF : num  -1.5 -0.8 0.5 0.4 0.8 -0.7 -1.1 -0.2 1.8 0.6 ...
#>  $ JFM : num  -1.3 -0.5 0.4 0.6 0.5 -0.6 -0.8 0.1 1.7 0.6 ...
#>  $ FMA : num  -1.2 -0.2 0.3 0.6 0 -0.7 -0.6 0.4 1.3 0.5 ...
#>  $ MAM : num  -1.2 0.2 0.3 0.7 -0.4 -0.8 -0.5 0.7 0.9 0.3 ...
#>  $ AMJ : num  -1.1 0.4 0.2 0.8 -0.5 -0.8 -0.5 0.9 0.7 0.2 ...
#>  $ MJJ : num  -0.9 0.6 0 0.8 -0.5 -0.7 -0.5 1.1 0.6 -0.1 ...
#>  $ JJA : num  -0.5 0.7 -0.1 0.7 -0.6 -0.7 -0.6 1.3 0.6 -0.2 ...
#>  $ JAS : num  -0.4 0.9 0 0.7 -0.8 -0.7 -0.6 1.3 0.4 -0.3 ...
#>  $ ASO : num  -0.4 1 0.2 0.8 -0.9 -1.1 -0.5 1.3 0.4 -0.1 ...
#>  $ SON : num  -0.4 1.2 0.1 0.8 -0.8 -1.4 -0.4 1.4 0.4 0 ...
#>  $ OND : num  -0.6 1 0 0.8 -0.7 -1.7 -0.4 1.5 0.5 0 ...
#>  $ NDJ : num  -0.8 0.8 0.1 0.8 -0.7 -1.5 -0.4 1.7 0.6 0 ...

Upvotes: 2

Related Questions