Lime
Lime

Reputation: 756

keeping discarded row from fread

I'm trying to extract Met office data using this code:

library(data.table)
met_office <- "https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt"
weather_data <- fread(met_office)

The problem is that I get this warning:

Warning message: In fread("https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt", : Discarded single-line footer: <<2020 123.6 287.6 88.6 25.8 18.4 135.5 134.9 176.5 78.2 197.2 135.0 558.4 132.8 446.9 410.5 >>

I have tried embedding fill = TRUE, however this also keeps data that is not necessary and ruins the format. Is there a way to keep the values that are removed because of NAs?

I have managed to do it with fill = TRUE although, I'd prefer a one-code fix in fread, here's what I used:

weather_data <- fread(met_office, fill = TRUE)
weather_data <- weather_data %>% mutate_all(na_if,"")
weather_data <- weather_data[weather_data$V1 %in% 1991:2020,c(1, 3)]

Upvotes: 1

Views: 203

Answers (1)

holzben
holzben

Reputation: 1471

An other approach would be to read it as fixed width file:

# colnames
col_nam <- c("year", "jan", "feb", "mar", "apr", "may", "jun",
             "jul",  "aug", "sep", "oct", "nov", "dec", "win",
             "spr",  "sum", "aut", "ann")

met_office <- "https://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainfall/date/England_NW_and_N_Wales.txt"

weather_data <- read.fwf(
  file=url(met_office),
  skip=6,
  widths=c(4, 7, 7, 7, 7, 7, 7, 7, 7,7,7,7,7,8,8,8,8,8),
  header = FALSE,
  col.names = col_nam) 

In this case the dec NA data are correct:

> tail(weather_data)
    year   jan   feb   mar   apr   may   jun   jul   aug   sep   oct   nov   dec      win   spr   sum   aut    ann
154 2015 167.7  77.7 109.6  49.3 141.8  49.9 109.6  99.7  52.9  68.1 260.9 343.1    412.4 300.7 259.2 381.9 1530.2
155 2016 203.2 142.4  84.5 100.7  54.2 127.1 107.5 123.5 111.7  41.0 124.0  80.7    688.7 239.3 358.1 276.7 1300.4
156 2017  74.4 111.8 150.0  26.1  60.8 136.4 120.6 115.3 169.7 144.3 147.9 144.1    266.9 236.9 372.4 461.9 1401.5
157 2018 153.2  78.0  89.4  95.3  47.5  33.1  57.5  98.8 137.0 113.9 119.0 158.2    375.3 232.2 189.4 369.8 1180.8
158 2019  72.8  83.0 187.8  75.8  57.2 134.3 100.6 163.7 186.1 154.8 126.3 147.1    314.1 320.9 398.6 467.1 1489.6
159 2020 123.6 287.6  88.6  25.8  18.4 135.5 134.9 176.5  78.2 197.2 135.0    NA    558.4 132.8 446.9 410.5     NA

Upvotes: 2

Related Questions