89_Simple
89_Simple

Reputation: 3805

sum overlapping groups in R

My dataframe consists of monthly weather data as follows for a given location

set.seed(123)
dat <- 
    data.frame(Year = rep(1980:1985, each = 12),
               Month = rep(1:12, times = 6),
               value = runif(12*6))

I have split the year into seasons as shown below.

s1 <- c(11, 12, 1, 2) #  season 1 consists of month 11, 12, 1 and 2 i.e. cuts across years
s2 <- c(3, 4, 5) # season 2 consists of month 3, 4, 5
s3 <- c(6, 7, 8, 9, 10) # season 3 consists of month 6, 7, 8, 9, 10
  

Taking example for 1980 - season 1 is Nov-Dec from 1979 and Jan-Feb from 1980 season 2 is from March - May of 1980 season 3 is June - Oct of 1980

However, for year 1980, season 1 is incomplete since it only has months 1 and 2 and missing the months 11 and 12 from 1979. In contrast, for year 1985 season 1 to season 3 is complete and hence I do not need months 11 and 12 from 1985 since it contributes to 1986 season1

With this background, I want to sum monthly values of each season by year so that the dataframe is in year X season format instead of year-month format In doing so there will be no values for 1980 season1 since it has missing months.
For cases when months cut across years, I don't know how to sum individual months?

 library(dplyr)


 season_list <- list(s1, s2, s3)

 temp_list <- list()          
 for(s in seq_along(season_list)){
   
   season_ref <- unlist(season_list[s])
   
   if(sum(diff(season_ref) < 0) != 0){  # check if season cuts across years
     
     dat %>% 
       dplyr::filter(Month %in% season_ref) %>%
       
       # how do I sum across years for this exception 
     
   } else { 
     
     # if season does not cut across years, simply filter the months in each year and add
     temp_list[[s]] <- 
     dat %>% 
       dplyr::filter(Month %in% season_ref) %>%
       dplyr::group_by(Year) %>%
       dplyr::summarise(season_value = sum(value)) %>%
       dplyr::mutate(season = s)
   }
 }   
 

    

Upvotes: 0

Views: 110

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269586

Assuming that you want to sum the values for each season calculate the Season and endYear (the year that the season ends) and then sum by those.

dat %>%
  group_by(endYear = Year + (Month %in% 11:12),
           Season = 1 * (Month %in% s1) + 
                    2 * (Month %in% s2) +
                    3 * (Month %in% s3)) %>%
  summarize(value = sum(value), .groups = "drop")

giving:

# A tibble: 19 x 3
   endYear Season value
     <int>  <dbl> <dbl>
 1    1980      1 1.08 
 2    1980      2 2.23 
 3    1980      3 2.47 
 4    1981      1 2.66 
 5    1981      2 1.25 
 6    1981      3 2.91 
 7    1982      1 3.00 
 8    1982      2 1.43 
 9    1982      3 3.50 
10    1983      1 1.48 
11    1983      2 0.693
12    1983      3 1.49 
13    1984      1 1.82 
14    1984      2 1.29 
15    1984      3 1.77 
16    1985      1 2.03 
17    1985      2 1.47 
18    1985      3 3.31 
19    1986      1 1.38 

Upvotes: 0

Related Questions