DGenchev
DGenchev

Reputation: 367

Calculate Group Mean and Overall Mean

I have a data frame with several variables I want to get the means of and a variable I want to group by. Then, I would like to get the proportion of each group's mean to the overall mean.

I have put together the following, but it is clumsy.

How would you go about it using dplyr or data.table? Bonus points for the option to return both the intermediate step (group and overall mean) and the final proportions.

library(tidyverse)

set.seed(1)
Data <- data.frame(
  X1 = sample(1:10),
  X2 = sample(11:20),
  X3 = sample(21:30),
  Y = sample(c("yes", "no"), 10, replace = TRUE)
)

groupMeans <- Data %>% 
  group_by(Y) %>%
  summarize_all(funs(mean))

overallMeans <- Data %>% 
  select(-Y) %>% 
  summarize_all(funs(mean))

index <- sweep(as.matrix(groupMeans[, -1]), MARGIN = 2,  as.matrix(overallMeans), FUN = "/")

Upvotes: 1

Views: 1039

Answers (3)

Nar
Nar

Reputation: 658

here is one more dplyr solution

index <- as.data.frame(Data %>% 
    group_by(Y) %>%
    summarise_all(mean) %>%
    select(-Y)  %>%
    rbind(Data %>% select(-Y) %>% summarise_all(mean))%>%
    mutate_all(funs( . / .[3])))[1:2,]

Upvotes: 3

DanY
DanY

Reputation: 6073

Here's a method with data.table:

#data
library(data.table)
set.seed(1)
dt <- data.table(
  x1 = sample(1:10),
  x2 = sample(11:20),
  x3 = sample(21:30),
  y = sample(c("yes", "no"), 10, replace = TRUE)
)

# group means
group_means <- dt[ , lapply(.SD, mean), by=y, .SDcols=1:3]

# overall means
overall_means <- dt[ , lapply(.SD, mean), .SDcols=1:3]

# clunky combination (sorry!)
group_means[ , perc_x1 := x1 / overall_means[[1]] ]
group_means[ , perc_x2 := x2 / overall_means[[2]] ]
group_means[ , perc_x3 := x3 / overall_means[[3]] ]

Upvotes: 0

ozanstats
ozanstats

Reputation: 2864

Here is one possible dplyr solution that contains everything you want:

Data %>% 
  group_by(Y) %>%
  summarise(
    group_avg_X1 = mean(X1),
    group_avg_X2 = mean(X2),
    group_avg_X3 = mean(X3)
  ) %>%
  mutate(
    overall_avg_X1 = mean(group_avg_X1),
    overall_avg_X2 = mean(group_avg_X2),
    overall_avg_X3 = mean(group_avg_X3),
    proportion_X1 = group_avg_X1 / overall_avg_X1,
    proportion_X2 = group_avg_X2 / overall_avg_X2,
    proportion_X3 = group_avg_X3 / overall_avg_X3
  )

# # A tibble: 2 x 10
#   Y     group_avg_X1 group_avg_X2 group_avg_X3 overall_avg_X1 overall_avg_X2 overall_avg_X3 proportion_X1
#   <fct>        <dbl>        <dbl>        <dbl>          <dbl>          <dbl>          <dbl>         <dbl>
# 1 no             6.6         14.6         25.8            5.5           15.5           25.5           1.2
# 2 yes            4.4         16.4         25.2            5.5           15.5           25.5           0.8
# # ... with 2 more variables: proportion_X2 <dbl>, proportion_X3 <dbl>

Upvotes: 0

Related Questions