How to apply summary function on two different types of data

Question

I have data frame with multiple variable , some variables those contains only 0's and 1's and other columns contains all the possible values.
How to summarize df columns contains only 0's & 1's with "sts_1=sum(sts_1*0.25,na.rm=T)" and other columns with "non_sts_3=mean(non_sts_3,na.rm = T)," with out specifying column name.

df <- data.frame(year=c("2014","2014","2015","2015","2015"),
                 month_=c("Jan","Jan","Jan","Jan","Feb"),
                 sts_1=c(0,1,1,1,0),
                 sts_2=c(1,0,0,1,NA),
                 non_sts_1=c(0,3,7,31,10),
                 non_sts_2=c(1,4,NA,12,6),
                 non_sts_3 = c(12,14,18,1,9))

We can do by dplyr by entering column names manually with below code

df<-group_by(df,year, month_)

df_aggregation<-summarise(df,
                          non_sts_1=mean(non_sts_1,na.rm = T),
                          non_sts_2=mean(non_sts_2,na.rm = T),
                          non_sts_3=mean(non_sts_3,na.rm = T),
                          sts_1=sum(sts_1*0.25,na.rm=T),
                          sts_2=sum(sts_2*0.25,na.rm=T))

Thanks in advance...

r2evans · Accepted Answer

@akrun's answer is straight-forward. If you prefer to not calculate unnecessarily, however, you can define a function that discriminates directly:

library(dplyr)
mysumm <- function(x, na.rm = FALSE) {
  if (all(x %in% 0:1)) {
    sum(x * 0.25, na.rm = na.rm)
  } else {
    mean(x, na.rm = na.rm)
  }
}

df %>%
  group_by(year, month_) %>%
  summarise_if(is.numeric, mysumm, na.rm = TRUE)
# # A tibble: 3 x 7
# # Groups:   year [?]
#     year month_ sts_1 sts_2 non_sts_1 non_sts_2 non_sts_3
#                     
# 1   2014    Jan  0.25  0.25       1.5       2.5      13.0
# 2   2015    Feb  0.00   NaN      10.0       6.0       9.0
# 3   2015    Jan  0.50  0.25      19.0      12.0       9.5

How to apply summary function on two different types of data

Answers (2)

Related Questions