Reputation: 177
I have data frame with multiple variable , some variables those contains only 0's and 1's and other columns contains all the possible values.
How to summarize df columns contains only 0's & 1's with "sts_1=sum(sts_1*0.25,na.rm=T)" and other columns with "non_sts_3=mean(non_sts_3,na.rm = T)," with out specifying column name.
df <- data.frame(year=c("2014","2014","2015","2015","2015"),
month_=c("Jan","Jan","Jan","Jan","Feb"),
sts_1=c(0,1,1,1,0),
sts_2=c(1,0,0,1,NA),
non_sts_1=c(0,3,7,31,10),
non_sts_2=c(1,4,NA,12,6),
non_sts_3 = c(12,14,18,1,9))
We can do by dplyr by entering column names manually with below code
df<-group_by(df,year, month_)
df_aggregation<-summarise(df,
non_sts_1=mean(non_sts_1,na.rm = T),
non_sts_2=mean(non_sts_2,na.rm = T),
non_sts_3=mean(non_sts_3,na.rm = T),
sts_1=sum(sts_1*0.25,na.rm=T),
sts_2=sum(sts_2*0.25,na.rm=T))
Thanks in advance...
Upvotes: 1
Views: 88
Reputation: 160447
@akrun's answer is straight-forward. If you prefer to not calculate unnecessarily, however, you can define a function that discriminates directly:
library(dplyr)
mysumm <- function(x, na.rm = FALSE) {
if (all(x %in% 0:1)) {
sum(x * 0.25, na.rm = na.rm)
} else {
mean(x, na.rm = na.rm)
}
}
df %>%
group_by(year, month_) %>%
summarise_if(is.numeric, mysumm, na.rm = TRUE)
# # A tibble: 3 x 7
# # Groups: year [?]
# year month_ sts_1 sts_2 non_sts_1 non_sts_2 non_sts_3
# <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2014 Jan 0.25 0.25 1.5 2.5 13.0
# 2 2015 Feb 0.00 NaN 10.0 6.0 9.0
# 3 2015 Jan 0.50 0.25 19.0 12.0 9.5
Upvotes: 5
Reputation: 887118
We can use summarise_all
and then remove the extra columns
df %>%
group_by(year, month_) %>%
summarise_all(funs(mean(., na.rm = TRUE), sum(.*0.25, na.rm = TRUE))) %>%
select(matches("month_|non_sts.*mean|\\bsts.*sum"))
# A tibble: 3 x 7
# Groups: year [2]
# year month_ non_sts_1_mean non_sts_2_mean non_sts_3_mean sts_1_sum sts_2_sum
# <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2014 Jan 1.5 2.5 13.0 0.25 0.25
#2 2015 Feb 10.0 6.0 9.0 0.00 0.00
#3 2015 Jan 19.0 12.0 9.5 0.50 0.25
Another approach if we have multiple sets of functions to be applied on different set of columns, will be to approach by applying the functions on different blocks of columns separately and then join
library(tidyverse)
flist <- list(function(x) mean(x, na.rm = TRUE), function(x) sum(x*0.25, na.rm = TRUE))
nm1 <- c("^non_sts", "^sts")
map2(nm1, flist, ~df %>%
group_by(year, month_) %>%
summarise_at(vars(matches(.x)), funs(.y))) %>%
reduce(inner_join, by = c('year', 'month_'))
# A tibble: 3 x 7
# Groups: year [?]
# year month_ non_sts_1 non_sts_2 non_sts_3 sts_1 sts_2
# <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2014 Jan 1.5 2.5 13.0 0.25 0.25
#2 2015 Feb 10.0 6.0 9.0 0.00 0.00
#3 2015 Jan 19.0 12.0 9.5 0.50 0.25
NOTE: This approach is flexible to use for any set of columns
If we were to modify the approach for 0:1 case
l1 <- df %>%
summarise_at(3:7, funs(all(. %in% c(0, 1, NA)))) %>%
unlist
nm1 <- split(names(df)[-(1:2)], l1)
and then apply as above by removing the matches
Upvotes: 4