D. Studer
D. Studer

Reputation: 1875

R: calculate groupwise means with overlapping groups

I have the following table with sample data:

set.seed(10)
dat <- data.frame(grp1 = sample(c(0, 1), size = 5, replace = TRUE),
                   grp2 = sample(c(0, 1), size = 5, replace = TRUE),
                   grp3 = sample(c(0, 1), size = 5, replace = TRUE),
                   value = round(runif(5, min = 0, max = 10), 0) )
dat
  grp1 grp2 grp3 value
1    0    0    0     4
2    0    1    1     1
3    1    1    0     3
4    1    0    0     4
5    1    0    1     8

But in the end, I need a tibble like this:

group              mean
-----------------------
grp1                5.0    
grp2                2.0
grp3                4.5

Usually, I would do a pivot_longer(cols = starts_with("grp")), so I could easily group_by(name) and calculate the groupwise means using summarise(mean = mean(value, na.rm=TRUE)). But here my problem is, that each of the 5 elements can belong to multiple groups (grp1, grp2, grp3), so there's some overlapping between the groups. My naive attempt would be to calculate the means for each group separately and then rbind() them. But as the number of groups increases this gets annoying, so I am looking for a way to automate this.

Upvotes: 3

Views: 221

Answers (5)

ThomasIsCoding
ThomasIsCoding

Reputation: 101433

Another base R option (similar to the answer by @akrun)

> rev(stack(colMeans((NA^(1 - dat[startsWith(names(dat), "grp")])) * dat$value, na.rm = TRUE)))
   ind values
1 grp1    5.0
2 grp2    2.0
3 grp3    4.

Upvotes: 1

akrun
akrun

Reputation: 887148

Using base R

stack(colMeans(replace(dat[-4], dat[-4] == 0, NA) * dat$value, na.rm = TRUE))[2:1]
   ind values
1 grp1    5.0
2 grp2    2.0
3 grp3    4.5

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 52004

colSums(dat$value * dat[-ncol(dat)]) / colSums(dat[-ncol(dat)])
data.frame(grp = names(d), mean = unname(d))

#   grp mean
#1 grp1  5.0
#2 grp2  2.0
#3 grp3  4.5

Upvotes: 3

www
www

Reputation: 39154

Here is one option. The solution is mainly based on base R. I only created a tibble using dplyr in the end.

library(dplyr)

dat2 <- dat[, -ncol(dat)] * dat[, ncol(dat)]

dat3 <- sapply(dat2, function(x) mean(x[x != 0]))

dat4 <- tibble(
  group = names(dat3),
  mean = dat3
)

dat4
# # A tibble: 3 x 2
#   group  mean
#   <chr> <dbl>
# 1 grp1    5  
# 2 grp2    2  
# 3 grp3    4.5

Upvotes: 4

Sotos
Sotos

Reputation: 51592

An idea can be,

library(dplyr)
library(tidyr)

dat %>% 
 pivot_longer(grp1:grp3, names_repair = 'unique') %>% 
 filter(value...3 == 1) %>% 
 group_by(name) %>% 
 summarise(means = mean(value...1))

New names:
* value -> value...1
* value -> value...3
# A tibble: 3 x 2
  name             means
  <chr>             <dbl>
1 grp1                5  
2 grp2                2  
3 grp3                4.5

Upvotes: 4

Related Questions