Reputation: 45
I have a dataframe with an ID column and several columns(one for each month). The number of month columns will vary each time I run the program. I am writing R code to extract the month column names to a vector and for each element of the vector, I am trying to group by ID variable and get the sum for each group. The following is the code I have used. Everything works fine expect the the result comes as overall sum instead of group sum. Any help is much appreciated.
Current pattern of result:
newdf for a given month:
ID summ_mon_2017_12
1 20
2 20
3 20
Expected pattern of result:
newdf for a given month:
ID summ_mon_2017_12
1 8
2 5
3 7
library(dplyr)
ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))
groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)
#extract month columns
month_vec <- as.vector(unique(substring(names(groupsum %>%
select(contains("mon_"))),1,11)))
addsummvar<-function(df, n1){
newvar<-paste("summ",n1,sep="_")
newdf<-paste("summ",n1,sep="_")
print(newvar)
print(newdf)
newdf<- df %>%
group_by(ID) %>%
summarise(
!!newvar := sum(df[n1])
)
print(newdf)
}
for(i in 1:length(month_vec)) {
summ_del <- addsummvar(df=groupsum, n1=month_vec[i])
}
Upvotes: 0
Views: 322
Reputation: 6325
Can't this one-liner help your purpose?
library(dplyr)
ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))
groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)
groupsum %>% group_by(ID) %>% summarise_all(.funs = sum)
Output:
> groupsum %>% group_by(ID) %>% summarise_all(.funs = sum)
# A tibble: 3 x 7
ID mon_2017_12 mon_2018_01 mon_2018_02 mon_2018_03 mon_2018_04 mon_2018_05
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.00 -0.00419 0.230 2.87 -1.38 -2.63 2.12
2 2.00 -1.12 1.45 -1.08 -0.907 -0.966 2.86
3 3.00 -2.11 3.49 0.282 -1.35 6.30 -3.60
Updated code to filter out non-numeric columns with summarise_if
:
library(dplyr)
set.seed(123)
ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3)
mon_2017_12<-c(rep(rnorm(14)))
mon_2018_01<-c(rep(rnorm(14)))
mon_2018_02<-c(rep(rnorm(14)))
mon_2018_03<-c(rep(rnorm(14)))
mon_2018_04<-c(rep(rnorm(14)))
mon_2018_05<-c(rep(rnorm(14)))
groupsum<-data.frame(ID,mon_2017_12,mon_2018_01,mon_2018_02,mon_2018_03,mon_2018_04,mon_2018_05)
groupsum$var2 <- c(letters[1:14])
head(groupsum)
groupsum %>% group_by(ID) %>% summarise_if(is.numeric,sum)
Upvotes: 1