Ignacio
Ignacio

Reputation: 7938

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

Note: The title of this question has been edited to make it the canonical question for issues when plyr functions mask their dplyr counterparts. The rest of the question remains unchanged.


Suppose I have the following data:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

With the good old plyr I can create a little table summarizing my data with the following code:

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

The output look like this:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33

I'm trying to move my code to dplyr and the %>% operator. My code takes DF then group it by group and sex and then summarise it. That is:

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

But my output is:

  mean   sd
1 35.56 9.92

What am I doing wrong?

Upvotes: 18

Views: 11624

Answers (2)

Carlos Cinelli
Carlos Cinelli

Reputation: 11617

The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

library(plyr)
    Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

  group sex  mean    sd
1     A   F 41.51  8.24
2     A   M 32.23 11.85
3     B   F 38.79 11.93
4     B   M 31.00  7.92
5     C   F 24.97  7.46
6     C   M 36.17  9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

Upvotes: 25

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193657

Your code is calling plyr::summarise instead of dplyr::summarise due to the order in which you have loaded "plyr" and "dplyr".

Demo:

library(dplyr) ## I'm guessing this is the order you loaded
library(plyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
#    mean   sd
# 1 36.88 9.76
dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
# Source: local data frame [6 x 4]
# Groups: group
# 
#   group sex  mean    sd
# 1     A   F 32.17  6.30
# 2     A   M 30.98  7.37
# 3     B   F 38.20  7.67
# 4     B   M 33.12 12.24
# 5     C   F 43.91 10.31
# 6     C   M 47.53  8.25

Upvotes: 4

Related Questions