Prasanna Murugesan
Prasanna Murugesan

Reputation: 45

Dynamic grouping of multiple columns

I have a dataframe with an ID column and several columns(one for each month). The number of month columns will vary each time I run the program. I am writing R code to extract the month column names to a vector and for each element of the vector, I am trying to group by ID variable and get the sum for each group. The following is the code I have used. Everything works fine expect the the result comes as overall sum instead of group sum. Any help is much appreciated.

Current pattern of result:
newdf for a given month:

ID summ_mon_2017_12
1   20
2   20
3   20


Expected pattern of result:
newdf for a given month:

ID summ_mon_2017_12
1   8
2   5
3   7

library(dplyr)

ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))

groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)

#extract month columns
month_vec <- as.vector(unique(substring(names(groupsum %>%  
                                                  select(contains("mon_"))),1,11)))                   

addsummvar<-function(df, n1){


  newvar<-paste("summ",n1,sep="_")
  newdf<-paste("summ",n1,sep="_")

  print(newvar)
  print(newdf)

  newdf<- df %>%
    group_by(ID) %>%
    summarise(
      !!newvar := sum(df[n1])
    )

  print(newdf)
}

for(i in 1:length(month_vec)) {  
  summ_del <- addsummvar(df=groupsum, n1=month_vec[i])
}

Upvotes: 0

Views: 322

Answers (1)

amrrs
amrrs

Reputation: 6325

Can't this one-liner help your purpose?

library(dplyr)

ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))

groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)


groupsum %>% group_by(ID) %>% summarise_all(.funs = sum)

Output:

> groupsum %>% group_by(ID) %>% summarise_all(.funs = sum)
# A tibble: 3 x 7
     ID mon_2017_12 mon_2018_01 mon_2018_02 mon_2018_03 mon_2018_04 mon_2018_05
  <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
1  1.00    -0.00419       0.230       2.87       -1.38       -2.63         2.12
2  2.00    -1.12          1.45       -1.08       -0.907      -0.966        2.86
3  3.00    -2.11          3.49        0.282      -1.35        6.30        -3.60

Updated code to filter out non-numeric columns with summarise_if:

library(dplyr)

set.seed(123)

ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))


groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)


groupsum$var2 <- c(letters[1:14])

head(groupsum)

groupsum %>% group_by(ID) %>% summarise_if(is.numeric,sum)

Upvotes: 1

Related Questions