bigFin
bigFin

Reputation: 13

What's the difference between these two statements (R / dplyr)

I have a question about dplyr. When given the data frame my_data

library(dplyr)
set.seed(20160229)
my_data = data.frame(
  y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
  x=c(rep('a', 2000), rep('b', 2000)),
  m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000)))

case 1:

pdat <- my_data %>%
  group_by(x, m) %>%
  do(data.frame(loc = density(.$y)$x,
                dens = density(.$y)$y))

and case 2:

 pdat <- my_data
pdat  <- group_by(my_data, x, m)
do(data.frame(pdat,loc=density(pdat$y)$x),dens=density(pdat$y)$y)

Why are these statements different? How can case 2 be changed to match case 1?

Upvotes: 0

Views: 78

Answers (1)

Taylor H
Taylor H

Reputation: 436

Your call to do is missing the .data argument. You need to either pipe it in, as in your "case 1," or provide it explicitly. Try something like:

do(.data = pdat, data.frame(loc = density(.$y)$x, dens = density(.$y)$y))

And now they match:

my_data %>%
group_by(x, m) %>%
do(data.frame(loc = density(.$y)$x,
            dens = density(.$y)$y)) -> a

b <- do(.data= pdat, data.frame(loc = density(.$y)$x, dens = density(.$y)$y))

identical(a,b)  # TRUE

Upvotes: 1

Related Questions