Reputation: 21
I have a dataframe, which contains dates of many years. It looks like this (instead of three years, I have 40 years):
DATES<-c(seq(as.Date('2017-01-01'), as.Date('2019-12-31'), by = 'days'))
df<-data.frame(DATES)
for each day I want to add the season. Hereby, Spring should start at the 20th of March, Summer at the 21st of June, Autumn at the 23rd of September, Winter at the 21st of December. These dates stay unchanged over the years.
I came up with the following code, which works (at least, I think so). However, I was wondering, if there isn't are more elegant way, to get what I want.
df$MONTH<-month(df$DATES)
df$DAY<-mday(df$DATES)
df$DAY_PLUS_MONTH<-df$DAY+df$MONTH*100
df <- df %>%
mutate(SEASON = case_when(
DAY_PLUS_MONTH %in% 320:620 ~ 'SPRING',
DAY_PLUS_MONTH %in% 621:922 ~ 'SUMMER',
DAY_PLUS_MONTH %in% 923:1221 ~ 'AUTUMN',
TRUE ~ 'WINTER'))
Upvotes: 1
Views: 1252
Reputation: 463
Using the hydroTSM
r package function of time2season
is another option. More information available at :https://www.rdocumentation.org/packages/hydroTSM/versions/0.6-0/topics/time2season
df$season<-time2season(df$datetime, out.fmt="seasons")
Upvotes: 0
Reputation: 160447
Using $yday
(whether from lubridate
or as.POSIXlt
) may give erroneous results for leap-years. I think a safer method is to create a vector of each of those dates across the years, adding one year in each direction (before/after).
I'm using findInterval
, but it's close-enough to cut
that you could use that method using the same variables here.
season_dates <- as.Date(sort(c(outer(
do.call(seq.int, as.list(1900 + as.POSIXlt(range(df$DATES) + c(-365, 365))$year)),
c("-03-20", "-06-21", "-09-23", "-12-21"),
paste0))))
season_dates
# [1] "2016-03-20" "2016-06-21" "2016-09-23" "2016-12-21" "2017-03-20" "2017-06-21" "2017-09-23" "2017-12-21" "2018-03-20"
# [10] "2018-06-21" "2018-09-23" "2018-12-21" "2019-03-20" "2019-06-21" "2019-09-23" "2019-12-21" "2020-03-20" "2020-06-21"
# [19] "2020-09-23" "2020-12-21"
season_names <- rep(c("Spring", "Summer", "Autumn", "Winter"), length.out = length(season_dates))
season_names
# [1] "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring"
# [14] "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter"
set.seed(42)
as_tibble(df) %>%
mutate(SEASON = season_names[ findInterval(DATES, season_dates) ]) %>%
sample_n(10) %>%
arrange(DATES)
# # A tibble: 10 x 2
# DATES SEASON
# <date> <chr>
# 1 2017-01-24 Winter
# 2 2017-02-18 Winter
# 3 2017-06-14 Spring
# 4 2017-11-17 Autumn
# 5 2017-12-22 Winter
# 6 2018-02-14 Winter
# 7 2018-07-15 Summer
# 8 2018-09-14 Summer
# 9 2018-09-26 Autumn
# 10 2019-06-18 Spring
I sampled the output just to show some variance, as otherwise the first 10 results were all winter. Also, I used as.POSIXlt(.)$year
and then had to adjust it since it is 1900-based. lubridate::year
would work here, too.
Upvotes: 2
Reputation: 173858
I think this should work for you:
cut(lubridate::yday(df$DATES - lubridate::days(79)),
breaks = c(0, 93, 187, 276, Inf),
labels = c("Spring", "Summer", "Autumn", "Winter"))
Upvotes: 2