milzyrj21
milzyrj21

Reputation: 23

How to find seasonal means over multiple years in R

My data set has flow rate measurements of a river for every day of the year from 1967 to 2021. This is split up into seasons: Winter (December, Jan, Feb), Spring (March, April, May), Summer (June, July, August) and Autumn (September, October, November).

This is a sample of my data set:

> (south_newton_wylye)
# A tibble: 20,100 x 7
   river year  season month   date                flow_rate quality
   <chr> <fct> <fct>  <fct>   <dttm>                  <dbl> <chr>  
 1 wylye 1967  Winter January 1967-01-01 00:00:00      6.67 Good   
 2 wylye 1967  Winter January 1967-01-02 00:00:00      6.39 Good   
 3 wylye 1967  Winter January 1967-01-03 00:00:00      6.32 Good   
 4 wylye 1967  Winter January 1967-01-04 00:00:00      6.34 Good   
 5 wylye 1967  Winter January 1967-01-05 00:00:00      6.37 Good   
 6 wylye 1967  Winter January 1967-01-06 00:00:00      6.45 Good   
 7 wylye 1967  Winter January 1967-01-07 00:00:00      6.65 Good   
 8 wylye 1967  Winter January 1967-01-08 00:00:00      6.54 Good   
 9 wylye 1967  Winter January 1967-01-09 00:00:00      6.53 Good   
10 wylye 1967  Winter January 1967-01-10 00:00:00      6.62 Good   
# ... with 20,090 more rows

I would like to find the mean flow rate of the seasons for each year. I am struggling to find a code for the winter season which runs across two years (e.g. December 1967, Jan 1977, Feb 1977).

This was my initial code:

stats.3 <- south_newton_wylye %>% group_by(season, year) %>% 
  summarise(mean = mean(flow_rate), sd = sd(flow_rate), n = n(),
            se = sd/sqrt(n))
stats.3

But for the winter season it includes months of the same year (Jan, Feb, Dec 1967) and not a winter season which starts in December and carries on to Jan and Feb the following year. I would also like another code which does everything I have mentioned but doesn't include the Autumn season and only includes winter, spring and summer. Does anyone know how I can go about this? Thanks :)

Upvotes: 2

Views: 402

Answers (1)

L&#233;on Ipdjian
L&#233;on Ipdjian

Reputation: 818

Your problem is easier to solve by far if you use your date variable.

Using dplyr :

Reproducible example

dates <- as.Date(c("1966-12-01","1967-01-01","1967-02-01","1967-03-01","1967-04-01","1967-05-01","1967-06-01","1967-07-01","1967-08-01","1967-09-01","1967-10-01","1967-11-01","1967-12-01"))
season <- c("Winter","Winter","Winter","Spring","Spring","Spring","Summer","Summer","Summer","Automn","Automn","Automn","Winter")
var <- c(1,2,3,5,5,5,7,7,7,9,9,9,10)

Resolution

df <- data.frame(dates,season,var) %>% # creating the dataframe
  dplyr::mutate(month = as.numeric(format(dates,"%m")),
                year = as.numeric(format(dates,"%Y")),
                season_id =  (12*year + month) %/% 3) %>% #generating an identifiant for every season that exists in your data
  dplyr::group_by(season_id) %>% # Grouping by the id
  dplyr::summarise(var = mean(var)) # Computing the statistics you need

Note that with this solution you do not need to have 3 values for each season for this code to work. Also note that the years have to be consecutive, but it is probably what you meant in your original post.

A bit more explanation :

  1. This code generates month and year variables.
  2. Then, it computes and identifiant for each month. For instance, January 1967 is the 1967*12+1th month.
  3. Each season is composed by 3 months. So the euclidian division of the n_th month by 3 is the k_th season. Thus, you can group by every k season to obtain the statistics you want !

Edit

To add intelligible labels to season ids :

df <- data.frame(dates,season,var) %>% # creating the dataframe
  dplyr::mutate(month = as.numeric(format(dates,"%m")),
                year = as.numeric(format(dates,"%Y")),
                season_id =  (12*year + month) %/% 3) %>% #generating an identifiant for every season that exists in your data
  dplyr::group_by(season_id) %>% # Grouping by the id
  dplyr::mutate(season_label = paste(min(year),season)) %>%  ## or max, it depends on your definition of a "winter of a year"
  dplyr::group_by(season_id,season_label) %>% ## season_label to keep the newly created label after the arriving summarise
  dplyr::summarise(var = mean(var)) # Computing the statistics you need

Also mind that you should keep season_id or you will struggle if you need to sort your data.

Upvotes: 2

Related Questions