LuckyLuke
LuckyLuke

Reputation: 21

R: assign seasons to dates by day and month

I have a dataframe, which contains dates of many years. It looks like this (instead of three years, I have 40 years):

DATES<-c(seq(as.Date('2017-01-01'), as.Date('2019-12-31'), by = 'days'))
df<-data.frame(DATES)

for each day I want to add the season. Hereby, Spring should start at the 20th of March, Summer at the 21st of June, Autumn at the 23rd of September, Winter at the 21st of December. These dates stay unchanged over the years.

I came up with the following code, which works (at least, I think so). However, I was wondering, if there isn't are more elegant way, to get what I want.

df$MONTH<-month(df$DATES)
df$DAY<-mday(df$DATES)
df$DAY_PLUS_MONTH<-df$DAY+df$MONTH*100

df <- df %>%
  mutate(SEASON = case_when(
    DAY_PLUS_MONTH %in% 320:620 ~ 'SPRING',
    DAY_PLUS_MONTH %in% 621:922 ~ 'SUMMER',
    DAY_PLUS_MONTH %in% 923:1221 ~ 'AUTUMN',
    TRUE ~ 'WINTER'))

Upvotes: 1

Views: 1252

Answers (3)

ay__ya
ay__ya

Reputation: 463

Using the hydroTSM r package function of time2season is another option. More information available at :https://www.rdocumentation.org/packages/hydroTSM/versions/0.6-0/topics/time2season

df$season<-time2season(df$datetime, out.fmt="seasons")

Upvotes: 0

r2evans
r2evans

Reputation: 160447

Using $yday (whether from lubridate or as.POSIXlt) may give erroneous results for leap-years. I think a safer method is to create a vector of each of those dates across the years, adding one year in each direction (before/after).

I'm using findInterval, but it's close-enough to cut that you could use that method using the same variables here.

season_dates <- as.Date(sort(c(outer(
  do.call(seq.int, as.list(1900 + as.POSIXlt(range(df$DATES) + c(-365, 365))$year)), 
  c("-03-20", "-06-21", "-09-23", "-12-21"),
  paste0))))
season_dates
#  [1] "2016-03-20" "2016-06-21" "2016-09-23" "2016-12-21" "2017-03-20" "2017-06-21" "2017-09-23" "2017-12-21" "2018-03-20"
# [10] "2018-06-21" "2018-09-23" "2018-12-21" "2019-03-20" "2019-06-21" "2019-09-23" "2019-12-21" "2020-03-20" "2020-06-21"
# [19] "2020-09-23" "2020-12-21"
season_names <- rep(c("Spring", "Summer", "Autumn", "Winter"), length.out = length(season_dates))
season_names
#  [1] "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter" "Spring"
# [14] "Summer" "Autumn" "Winter" "Spring" "Summer" "Autumn" "Winter"

set.seed(42)
as_tibble(df) %>%
  mutate(SEASON = season_names[ findInterval(DATES, season_dates) ]) %>%
  sample_n(10) %>%
  arrange(DATES)
# # A tibble: 10 x 2
#    DATES      SEASON
#    <date>     <chr> 
#  1 2017-01-24 Winter
#  2 2017-02-18 Winter
#  3 2017-06-14 Spring
#  4 2017-11-17 Autumn
#  5 2017-12-22 Winter
#  6 2018-02-14 Winter
#  7 2018-07-15 Summer
#  8 2018-09-14 Summer
#  9 2018-09-26 Autumn
# 10 2019-06-18 Spring

I sampled the output just to show some variance, as otherwise the first 10 results were all winter. Also, I used as.POSIXlt(.)$year and then had to adjust it since it is 1900-based. lubridate::year would work here, too.

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173858

I think this should work for you:

cut(lubridate::yday(df$DATES - lubridate::days(79)), 
    breaks = c(0, 93, 187, 276, Inf), 
    labels = c("Spring", "Summer", "Autumn", "Winter"))

Upvotes: 2

Related Questions