89_Simple
89_Simple

Reputation: 3805

R dplyr perform different aggregation by group

I have a dataframe dat which looks like this:

   dat <- structure(list(cell.ID = c(329574L, 329574L, 329574L, 329574L, 
    329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 
    329574L), Year = c("2010", "2010", "2010", "2010", "2010", "2010", 
    "2010", "2010", "2010", "2010", "2010", "2010"), month_name = c("June", 
    "July", "June", "July", "June", "July", "June", "July", "June", 
    "July", "June", "July"), value = c(459.860986624053, 398.94083733151, 
    16, 23, 111.69, 453.333, 71.55, 30.38, 31.928, 30.13355, 17.587, 
    19.7938709677419), variable_name = c("ETo", "ETo", "Rday", "Rday", 
    "Rsum", "Rsum", "Thdd", "Thdd", "Tmax", "Tmax", "Tmin", "Tmin"
    ), monthID = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L
    )), row.names = c(NA, -12L), class = "data.frame")


library(dplyr)

dat  %>%
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = sum(value))

If I want to average the Tmax and Tmin and sum the rest of the variables, I did this

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), mean(value), sum(value)))

Error: Column `variable` must be length 1 (a summary value), not 2  

How do I correct this?

Upvotes: 0

Views: 64

Answers (2)

Cettt
Cettt

Reputation: 11981

Another way to do this is dplyr is to use if and else instead of ifelse:

dat %>%        
  group_by(Year, variable_name) %>% 
  summarise(variable = if (variable_name[1] %in% c('Tmax', 'Tmin')) mean(value) else sum(value))

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

Upvotes: 2

ulfelder
ulfelder

Reputation: 5335

I think the problem is that ifelse in this context is operating row-wise, not at the level of the group. If that's right, then you could work around the problem by getting both summary statistics and then conditionally selecting the one you want by variable name, like this:

dat %>%        
dplyr::group_by(Year, variable_name) %>% 
dplyr::summarise(var_mean = mean(value), var_sum = sum(value)) %>%
dplyr::mutate(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), var_mean, var_sum)) %>%
dplyr::select(-var_mean, -var_sum)

Result:

# A tibble: 6 x 3
# Groups:   Year [1]
  Year  variable_name variable
  <chr> <chr>            <dbl>
1 2010  ETo              859. 
2 2010  Rday              39  
3 2010  Rsum             565. 
4 2010  Thdd             102. 
5 2010  Tmax              31.0
6 2010  Tmin              18.7

Upvotes: 2

Related Questions