Reputation: 756
I'm trying to extract Met office data using this code:
library(data.table)
met_office <- "https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt"
weather_data <- fread(met_office)
The problem is that I get this warning:
Warning message: In fread("https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt", : Discarded single-line footer: <<2020 123.6 287.6 88.6 25.8 18.4 135.5 134.9 176.5 78.2 197.2 135.0 558.4 132.8 446.9 410.5 >>
I have tried embedding fill = TRUE
, however this also keeps data that is not necessary and ruins the format. Is there a way to keep the values that are removed because of NAs
?
I have managed to do it with fill = TRUE
although, I'd prefer a one-code fix in fread
, here's what I used:
weather_data <- fread(met_office, fill = TRUE)
weather_data <- weather_data %>% mutate_all(na_if,"")
weather_data <- weather_data[weather_data$V1 %in% 1991:2020,c(1, 3)]
Upvotes: 1
Views: 203
Reputation: 1471
An other approach would be to read it as fixed width file:
# colnames
col_nam <- c("year", "jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "dec", "win",
"spr", "sum", "aut", "ann")
met_office <- "https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt"
weather_data <- read.fwf(
file=url(met_office),
skip=6,
widths=c(4, 7, 7, 7, 7, 7, 7, 7, 7,7,7,7,7,8,8,8,8,8),
header = FALSE,
col.names = col_nam)
In this case the dec NA data are correct:
> tail(weather_data)
year jan feb mar apr may jun jul aug sep oct nov dec win spr sum aut ann
154 2015 167.7 77.7 109.6 49.3 141.8 49.9 109.6 99.7 52.9 68.1 260.9 343.1 412.4 300.7 259.2 381.9 1530.2
155 2016 203.2 142.4 84.5 100.7 54.2 127.1 107.5 123.5 111.7 41.0 124.0 80.7 688.7 239.3 358.1 276.7 1300.4
156 2017 74.4 111.8 150.0 26.1 60.8 136.4 120.6 115.3 169.7 144.3 147.9 144.1 266.9 236.9 372.4 461.9 1401.5
157 2018 153.2 78.0 89.4 95.3 47.5 33.1 57.5 98.8 137.0 113.9 119.0 158.2 375.3 232.2 189.4 369.8 1180.8
158 2019 72.8 83.0 187.8 75.8 57.2 134.3 100.6 163.7 186.1 154.8 126.3 147.1 314.1 320.9 398.6 467.1 1489.6
159 2020 123.6 287.6 88.6 25.8 18.4 135.5 134.9 176.5 78.2 197.2 135.0 NA 558.4 132.8 446.9 410.5 NA
Upvotes: 2