Reputation: 1382
I wrote this little function to find the R-squared value of a regression performed on two variables in the mtcars
data set, which is included in R by default:
get_r_squared = function(x) summary(lm(mpg ~ hp, data = x))$r.squared
It seems to work as expected when I give it the full data set:
get_r_squared(mtcars)
# [1] 0.6024373
However, if I try to use it as part of a dplyr
pipeline on a subset of the data, it returns the same answer as above three times when I expected it to return a different value for each subset.
library(dplyr)
mtcars %>%
group_by_("cyl") %>%
summarise_(r_squared = get_r_squared(.))
## Source: local data frame [3 x 2]
##
## cyl r_squared
## 1 4 0.6024373
## 2 6 0.6024373
## 3 8 0.6024373
I was expecting the values to look like this instead
sapply(
unique(mtcars$cyl),
function(cyl){
get_r_squared(mtcars[mtcars$cyl == cyl, ])
}
)
# [1] 0.01614624 0.27405583 0.08044919
I've confirmed that this is not a plyr
namespace issue: that package is not loaded.
search()
## [1] ".GlobalEnv" "package:knitr" "package:dplyr"
## [4] "tools:rstudio" "package:stats" "package:graphics"
## [7] "package:grDevices" "package:utils" "package:datasets"
## [10] "package:methods" "Autoloads" "package:base"
I'm not sure what's going on here. Could it be related to nonstandard evaluation in the lm
function? Or am I just misunderstanding how group_by
works? Or perhaps something else?
Upvotes: 1
Views: 258
Reputation: 103898
I think you've misunderstood how summarise()
works - it doesn't do anything with .
, and the fact that it works at all is just happy chance. Instead, try something like this:
library(dplyr)
get_r_squared <- function(x, y) summary(lm(x ~ y))$r.squared
mtcars %>%
group_by(cyl) %>%
summarise(r_squared = get_r_squared(mpg, wt))
Upvotes: 2
Reputation: 887048
Try with do
mtcars %>%
group_by(cyl) %>%
do(data.frame(r_squared=get_r_squared(.)))
Upvotes: 3