Reputation: 11
I am trying to download the data found at http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203 using R. From my understanding, this is a list. I've tried to use the XML package, but continue to get the error 'Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'readHTMLList' for signature '"NULL"''. I get the same error when using readHTMLTable() as well. This is how I've been using the function:
url = "http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203"
mydata = read.HTMLTable(url, which = 11, trim = T)
I've also tried to include header = T
, stringsAsFactors = F
, and readLines(url)
in the function options to no avail. If I only needed one of the tables, I would manually download it, but I need a large volume of this data. My idea was to loop through the FROM= and TO= in the url to access the different days and times of the sounding data once I get the initial function to work. Any help would be awesome.
Upvotes: 1
Views: 242
Reputation: 94182
Using rvest and readr packages:
> txt = read_html(url) %>% html_node("pre") %>% html_text()
that gets the text from inside the <pre>
tags. then:
> data = txt %>% read_fwf(fwf_empty(.,skip=5),skip=5)
makes a data frame of it:
> head(data)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1 1000.0 130 NA NA NA NA NA NA NA NA NA
2 963.0 456 13.2 8.8 75 7.43 240 1 289.4 310.7 290.8
3 962.0 465 15.2 9.2 67 7.64 247 1 291.6 313.6 292.9
4 955.0 527 18.4 8.4 52 7.29 295 1 295.4 316.8 296.7
5 945.8 610 18.9 7.2 47 6.79 0 1 296.7 316.9 298.0
6 944.0 626 19.0 7.0 46 6.70 15 1 297.0 316.9 298.2
Getting the names is left as an exercise for the reader....
Upvotes: 2
Reputation: 78792
Thankfully, this is a plain text table wrapped in a <pre>
tag, so we can read in the HTML, extract the text from the <pre>
tag and then read it into a table while supplying decent column names and types:
library(rvest)
library(readr)
URL <- "http://weather.uwyo.edu/cgi-bin/sounding?region=naconf&TYPE=TEXT%3ALIST&YEAR=2016&MONTH=08&FROM=2612&TO=2612&STNM=71203"
pg <-read_html(URL)
html_nodes(pg, "pre")[[1]] %>%
html_text() -> dat
read_table(dat, skip=5, col_types="ddddddddddd",
col_names=c("pres", "hght", "temp", "dwpt", "relh", "mixr",
"drct", "sknt", "thta", "thte", "thtv")) -> df
dplyr::glimpse(df)
## Variables: 11
## $ pres <dbl> 1000.0, 963.0, 962.0, 955.0, 945.8, 944.0, 925.0, 912.8, 891.0, 880.8, 877.0, 850.0, 819.1...
## $ hght <dbl> 130, 456, 465, 527, 610, 626, 800, 914, 1121, 1219, 1256, 1522, 1829, 2134, 2438, 2743, 31...
## $ temp <dbl> NA, 13.2, 15.2, 18.4, 18.9, 19.0, 18.8, 18.2, 17.2, 17.4, 17.4, 15.0, 12.4, 9.8, 7.2, 4.7,...
## $ dwpt <dbl> NA, 8.8, 9.2, 8.4, 7.2, 7.0, 6.8, 6.2, 5.2, 5.3, 5.4, 4.0, 2.8, 1.6, 0.4, -0.9, -2.3, -2.5...
## $ relh <dbl> NA, 75, 67, 52, 47, 46, 46, 45, 45, 45, 45, 48, 52, 56, 62, 67, 75, 75, 73, 70, 68, 23, 17...
## $ mixr <dbl> NA, 7.43, 7.64, 7.29, 6.79, 6.70, 6.74, 6.57, 6.26, 6.40, 6.45, 6.03, 5.74, 5.46, 5.19, 4....
## $ drct <dbl> NA, 240, 247, 295, 0, 15, 175, 170, 72, 25, 22, 0, 335, 300, 290, 300, 300, 300, 300, 319,...
## $ sknt <dbl> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 3, 5, 5, 7, 6, 6, 6, 10, 10, 21, 21, 21, 21, 21, 21, ...
## $ thta <dbl> NA, 289.4, 291.6, 295.4, 296.7, 297.0, 298.5, 299.1, 300.1, 301.2, 301.6, 301.9, 302.3, 30...
## $ thte <dbl> NA, 310.7, 313.6, 316.8, 316.9, 316.9, 318.6, 318.8, 319.0, 320.6, 321.2, 320.2, 319.8, 31...
## $ thtv <dbl> NA, 290.8, 292.9, 296.7, 298.0, 298.2, 299.7, 300.3, 301.2, 302.4, 302.8, 302.9, 303.4, 30...
Upvotes: 5