Reputation: 1875
I have the following table with sample data:
set.seed(10)
dat <- data.frame(grp1 = sample(c(0, 1), size = 5, replace = TRUE),
grp2 = sample(c(0, 1), size = 5, replace = TRUE),
grp3 = sample(c(0, 1), size = 5, replace = TRUE),
value = round(runif(5, min = 0, max = 10), 0) )
dat
grp1 grp2 grp3 value
1 0 0 0 4
2 0 1 1 1
3 1 1 0 3
4 1 0 0 4
5 1 0 1 8
But in the end, I need a tibble like this:
group mean
-----------------------
grp1 5.0
grp2 2.0
grp3 4.5
Usually, I would do a pivot_longer(cols = starts_with("grp"))
, so I could easily group_by(name)
and calculate the groupwise means using summarise(mean = mean(value, na.rm=TRUE))
. But here my problem is, that each of the 5 elements can belong to multiple groups (grp1, grp2, grp3), so there's some overlapping between the groups. My naive attempt would be to calculate the means for each group separately and then rbind()
them. But as the number of groups increases this gets annoying, so I am looking for a way to automate this.
Upvotes: 3
Views: 221
Reputation: 101433
Another base R option (similar to the answer by @akrun)
> rev(stack(colMeans((NA^(1 - dat[startsWith(names(dat), "grp")])) * dat$value, na.rm = TRUE)))
ind values
1 grp1 5.0
2 grp2 2.0
3 grp3 4.
Upvotes: 1
Reputation: 887148
Using base R
stack(colMeans(replace(dat[-4], dat[-4] == 0, NA) * dat$value, na.rm = TRUE))[2:1]
ind values
1 grp1 5.0
2 grp2 2.0
3 grp3 4.5
Upvotes: 2
Reputation: 52004
colSums(dat$value * dat[-ncol(dat)]) / colSums(dat[-ncol(dat)])
data.frame(grp = names(d), mean = unname(d))
# grp mean
#1 grp1 5.0
#2 grp2 2.0
#3 grp3 4.5
Upvotes: 3
Reputation: 39154
Here is one option. The solution is mainly based on base R. I only created a tibble
using dplyr
in the end.
library(dplyr)
dat2 <- dat[, -ncol(dat)] * dat[, ncol(dat)]
dat3 <- sapply(dat2, function(x) mean(x[x != 0]))
dat4 <- tibble(
group = names(dat3),
mean = dat3
)
dat4
# # A tibble: 3 x 2
# group mean
# <chr> <dbl>
# 1 grp1 5
# 2 grp2 2
# 3 grp3 4.5
Upvotes: 4
Reputation: 51592
An idea can be,
library(dplyr)
library(tidyr)
dat %>%
pivot_longer(grp1:grp3, names_repair = 'unique') %>%
filter(value...3 == 1) %>%
group_by(name) %>%
summarise(means = mean(value...1))
New names:
* value -> value...1
* value -> value...3
# A tibble: 3 x 2
name means
<chr> <dbl>
1 grp1 5
2 grp2 2
3 grp3 4.5
Upvotes: 4