Reputation: 367
I have a data frame with several variables I want to get the means of and a variable I want to group by. Then, I would like to get the proportion of each group's mean to the overall mean.
I have put together the following, but it is clumsy.
How would you go about it using dplyr
or data.table
? Bonus points for the option to return both the intermediate step (group and overall mean) and the final proportions.
library(tidyverse)
set.seed(1)
Data <- data.frame(
X1 = sample(1:10),
X2 = sample(11:20),
X3 = sample(21:30),
Y = sample(c("yes", "no"), 10, replace = TRUE)
)
groupMeans <- Data %>%
group_by(Y) %>%
summarize_all(funs(mean))
overallMeans <- Data %>%
select(-Y) %>%
summarize_all(funs(mean))
index <- sweep(as.matrix(groupMeans[, -1]), MARGIN = 2, as.matrix(overallMeans), FUN = "/")
Upvotes: 1
Views: 1039
Reputation: 658
here is one more dplyr solution
index <- as.data.frame(Data %>%
group_by(Y) %>%
summarise_all(mean) %>%
select(-Y) %>%
rbind(Data %>% select(-Y) %>% summarise_all(mean))%>%
mutate_all(funs( . / .[3])))[1:2,]
Upvotes: 3
Reputation: 6073
Here's a method with data.table
:
#data
library(data.table)
set.seed(1)
dt <- data.table(
x1 = sample(1:10),
x2 = sample(11:20),
x3 = sample(21:30),
y = sample(c("yes", "no"), 10, replace = TRUE)
)
# group means
group_means <- dt[ , lapply(.SD, mean), by=y, .SDcols=1:3]
# overall means
overall_means <- dt[ , lapply(.SD, mean), .SDcols=1:3]
# clunky combination (sorry!)
group_means[ , perc_x1 := x1 / overall_means[[1]] ]
group_means[ , perc_x2 := x2 / overall_means[[2]] ]
group_means[ , perc_x3 := x3 / overall_means[[3]] ]
Upvotes: 0
Reputation: 2864
Here is one possible dplyr
solution that contains everything you want:
Data %>%
group_by(Y) %>%
summarise(
group_avg_X1 = mean(X1),
group_avg_X2 = mean(X2),
group_avg_X3 = mean(X3)
) %>%
mutate(
overall_avg_X1 = mean(group_avg_X1),
overall_avg_X2 = mean(group_avg_X2),
overall_avg_X3 = mean(group_avg_X3),
proportion_X1 = group_avg_X1 / overall_avg_X1,
proportion_X2 = group_avg_X2 / overall_avg_X2,
proportion_X3 = group_avg_X3 / overall_avg_X3
)
# # A tibble: 2 x 10
# Y group_avg_X1 group_avg_X2 group_avg_X3 overall_avg_X1 overall_avg_X2 overall_avg_X3 proportion_X1
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 no 6.6 14.6 25.8 5.5 15.5 25.5 1.2
# 2 yes 4.4 16.4 25.2 5.5 15.5 25.5 0.8
# # ... with 2 more variables: proportion_X2 <dbl>, proportion_X3 <dbl>
Upvotes: 0