kittygirl
kittygirl

Reputation: 2443

How to use dplyr's pipe with aggregate

df<-data.frame(gender = c('A', 'B', 'B','B','A'),q01 = c(1, 6, 3,8,5),q02 = c(5, 3, 6,5,2)) 
  gender q01 q02
1      A   1   5
2      B   6   3
3      B   3   6
4      B   8   5
5      A   5   2

I want to calculate q01*2+q02 and then get the mean by gender group,the expected result as below:

A 9.5
B 16

I tried but failed:

 df %>% aggregate(c(q01,q02)~gender,mean(q01*2+q02))

Error in mean(q01 * 2 + q02) : object 'q01' not found

df %>% group_by(gender) %>% mean(.$q01*2+.$q02)
[1] NA

Warning message: In mean.default(., .$q01 * 2 + .$q02) : argument is not numeric or logical: returning NA

What's the problem?

Upvotes: 1

Views: 2837

Answers (3)

akrun
akrun

Reputation: 887048

In the OP's code for dplyr + aggregate, the data is not specified along with using c i.e. concatenate two columns together. Also,

aggregate(c(q01,q02)~gender,df, mean(q01*2+q02))

Error in model.frame.default(formula = c(q01, q02) ~ gender, data = df) : variable lengths differ (found for 'gender')

Here,with c(q01, q02), it is like concatenating c(1:5, 6:10) and now the length will be double as that of previous along with the fact that the FUN used will not get evaluated as it wouldn't find the 'q01' or 'q02'

Instead, we can cbind to create new column with the formula method of aggregate and then get the mean

library(dplyr) 
df %>%
     aggregate(cbind(q = q01 * 2 + q02) ~ gender, data = ., mean)
#  gender    q
#1      A  9.5
#2      B 16.0

NOTE: In dplyr, the data from the lhs of %>% can be specified with a ..

NOTE2: Here, we assume that the question is to understand how the aggregate can be made to work in the %>%. If it is just to get the mean, the whole process can be done with dplyr

f1 <- function(x, y, val) mean(x * val + y)
df %>%
    group_by(gender) %>%
    summarise(q = f1(q01, q02, 2))

Or using data.table methods

library(data.table)
setDT(df)[, .(q = mean(q01  * 2 + q02)), .(gender)]
#   gender    q
#1:      A  9.5
#2:      B 16.0

Or using base R with by

stack(by(df[-1], df[1], FUN = function(x) mean(x[,1] * 2 + x[,2])))

Or with aggregate

aggregate(cbind(q = q01 * 2 + q02) ~ gender, df, mean)

Upvotes: 3

NelsonGon
NelsonGon

Reputation: 13309

Sticking with the same logicc:

  df %>% 
   do(aggregate(I(q01*2)+q02~gender,
             data=.,mean)) %>% 
   setNames(.,nm=c("gender","q"))
  gender    q
1      A  9.5
2      B 16.0

NOTE: I do note that do's lifecycle is marked as questioning.

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388907

Better to keep dplyr and base approaches separate. Each of them have their own way to handle data. With dplyr you can do

library(dplyr)

df %>%
   mutate(q = q01 * 2 + q02) %>%
   group_by(gender) %>%
   summarise(q = mean(q))

#  gender     q
#  <fct>  <dbl>
#1 A        9.5
#2 B       16  

and using base R aggregate

aggregate(q~gender, transform(df, q = q01*2+q02), mean)

Upvotes: 2

Related Questions