Reputation: 2443
df<-data.frame(gender = c('A', 'B', 'B','B','A'),q01 = c(1, 6, 3,8,5),q02 = c(5, 3, 6,5,2))
gender q01 q02
1 A 1 5
2 B 6 3
3 B 3 6
4 B 8 5
5 A 5 2
I want to calculate q01*2+q02
and then get the mean
by gender
group,the expected result as below:
A 9.5
B 16
I tried but failed:
df %>% aggregate(c(q01,q02)~gender,mean(q01*2+q02))
Error in mean(q01 * 2 + q02) : object 'q01' not found
df %>% group_by(gender) %>% mean(.$q01*2+.$q02)
[1] NA
Warning message: In mean.default(., .$q01 * 2 + .$q02) : argument is not numeric or logical: returning NA
What's the problem?
Upvotes: 1
Views: 2837
Reputation: 887048
In the OP's code for dplyr
+ aggregate
, the data
is not specified along with using c
i.e. concatenate two columns together. Also,
aggregate(c(q01,q02)~gender,df, mean(q01*2+q02))
Error in model.frame.default(formula = c(q01, q02) ~ gender, data = df) : variable lengths differ (found for 'gender')
Here,with c(q01, q02)
, it is like concatenating c(1:5, 6:10)
and now the length will be double as that of previous along with the fact that the FUN
used will not get evaluated as it wouldn't find the 'q01' or 'q02'
Instead, we can cbind
to create new column with the formula
method of aggregate
and then get the mean
library(dplyr)
df %>%
aggregate(cbind(q = q01 * 2 + q02) ~ gender, data = ., mean)
# gender q
#1 A 9.5
#2 B 16.0
NOTE: In dplyr
, the data from the lhs
of %>%
can be specified with a .
.
NOTE2: Here, we assume that the question is to understand how the aggregate
can be made to work in the %>%
. If it is just to get the mean
, the whole process can be done with dplyr
f1 <- function(x, y, val) mean(x * val + y)
df %>%
group_by(gender) %>%
summarise(q = f1(q01, q02, 2))
Or using data.table
methods
library(data.table)
setDT(df)[, .(q = mean(q01 * 2 + q02)), .(gender)]
# gender q
#1: A 9.5
#2: B 16.0
Or using base R
with by
stack(by(df[-1], df[1], FUN = function(x) mean(x[,1] * 2 + x[,2])))
Or with aggregate
aggregate(cbind(q = q01 * 2 + q02) ~ gender, df, mean)
Upvotes: 3
Reputation: 13309
Sticking with the same logicc:
df %>%
do(aggregate(I(q01*2)+q02~gender,
data=.,mean)) %>%
setNames(.,nm=c("gender","q"))
gender q
1 A 9.5
2 B 16.0
NOTE:
I do note that do
's lifecycle is marked as questioning.
Upvotes: 2
Reputation: 388907
Better to keep dplyr
and base approaches separate. Each of them have their own way to handle data. With dplyr
you can do
library(dplyr)
df %>%
mutate(q = q01 * 2 + q02) %>%
group_by(gender) %>%
summarise(q = mean(q))
# gender q
# <fct> <dbl>
#1 A 9.5
#2 B 16
and using base R aggregate
aggregate(q~gender, transform(df, q = q01*2+q02), mean)
Upvotes: 2