Reputation: 934
I'm currently working on someone else's code, and I've run into an issue I can't seem to pass. When extracting the data from a site for one of the datasets being used, it gives me the error shown below where it can't find an "inherited method" for readHTMLtable.
url = "http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml"
page <- readLines(url)
Warning message:
In readLines(url) :
incomplete final line found on 'http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml'
ONI_data_raw <- data.table (readHTMLTable(page, which=8))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
My goal is just to retrieve a dataset back. I've run another piece of code for another dataset pulled from another website and got just what I am trying to achieve with the one above -
url = "http://www.esrl.noaa.gov/psd/data/correlation/amon.us.data"
AMO_data_raw <- read.table (url, header = FALSE, skip = 1, nrow = length(count.fields(url))-6)
headers <- c ("Year", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
setnames(AMO_data_raw, headers)
AMO_data_raw = melt(AMO_data_raw, id.vars = "Year")
names(AMO_data_raw)[2] = "Month"
names(AMO_data_raw)[3] = "AMO_Value"
AMO_data_raw$Month = as.numeric(AMO_data_raw$Month)
AMO_data_raw$AMO_Value= as.numeric(AMO_data_raw$AMO_Value)
AMO_data_raw = subset(AMO_data_raw, AMO_data_raw$Year %in% time )
head(AMO_data_raw)
Year Month AMO_Value
3 1950 1 0.113
4 1951 1 0.105
5 1952 1 0.174
6 1953 1 0.268
7 1954 1 0.229
8 1955 1 0.078
Any advice would be greatly appreciated.
Upvotes: 0
Views: 544
Reputation: 43354
If you look at the HTML you get from the first URL, it tells you the site has moved, and gives you a new URL. (If you look at it in a browser, it likely automatically redirects you.) Using the URL it redirects to and rvest for scraping,
library(rvest)
h <- read_html('http://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php')
episodes <- h %>%
html_node('table[border="1"]') %>% # get first table node with a border attribute of "1"
html_table(header = TRUE, fill = TRUE) %>% # parse the table
dplyr::filter(Year != 'Year') %>% # remove interior header rows
readr::type_convert() # convert types from character
str(episodes)
#> 'data.frame': 68 obs. of 13 variables:
#> $ Year: num 1950 1951 1952 1953 1954 ...
#> $ DJF : num -1.5 -0.8 0.5 0.4 0.8 -0.7 -1.1 -0.2 1.8 0.6 ...
#> $ JFM : num -1.3 -0.5 0.4 0.6 0.5 -0.6 -0.8 0.1 1.7 0.6 ...
#> $ FMA : num -1.2 -0.2 0.3 0.6 0 -0.7 -0.6 0.4 1.3 0.5 ...
#> $ MAM : num -1.2 0.2 0.3 0.7 -0.4 -0.8 -0.5 0.7 0.9 0.3 ...
#> $ AMJ : num -1.1 0.4 0.2 0.8 -0.5 -0.8 -0.5 0.9 0.7 0.2 ...
#> $ MJJ : num -0.9 0.6 0 0.8 -0.5 -0.7 -0.5 1.1 0.6 -0.1 ...
#> $ JJA : num -0.5 0.7 -0.1 0.7 -0.6 -0.7 -0.6 1.3 0.6 -0.2 ...
#> $ JAS : num -0.4 0.9 0 0.7 -0.8 -0.7 -0.6 1.3 0.4 -0.3 ...
#> $ ASO : num -0.4 1 0.2 0.8 -0.9 -1.1 -0.5 1.3 0.4 -0.1 ...
#> $ SON : num -0.4 1.2 0.1 0.8 -0.8 -1.4 -0.4 1.4 0.4 0 ...
#> $ OND : num -0.6 1 0 0.8 -0.7 -1.7 -0.4 1.5 0.5 0 ...
#> $ NDJ : num -0.8 0.8 0.1 0.8 -0.7 -1.5 -0.4 1.7 0.6 0 ...
Upvotes: 2