Reputation: 20329
The following two do
statements deliver slightly different results:
library(dplyr)
set.seed(1)
d <- data.frame(x = rnorm(30), y = rnorm(30), w = factor(sample(3, 30, TRUE)))
(r1 <- d %>% group_by(w) %>%
do(data.frame(s1 = sum(.$x),
s2 = sum(.$y),
s3 = {
z <- seq_along(.$x)
sum(z)
})))
# Source: local data frame [3 x 4]
# Groups: w [3]
#
# w s1 s2 s3
# (fctr) (dbl) (dbl) (int)
# 1 1 0.1292572 0.8447634 45
# 2 2 0.2092895 3.3060157 91
# 3 3 2.1351984 -0.1675416 36
(r2 <- d %>% group_by(w) %>%
do(s1 = sum(.$x),
s2 = sum(.$y),
s3 = {
z <- seq_along(.$x)
sum(z)
}))
# Source: local data frame [3 x 4]
# Groups: <by row>
#
# w s1 s2 s3
# (fctr) (chr) (chr) (chr)
# 1 1 <dbl[1]> <dbl[1]> <int[1]>
# 2 2 <dbl[1]> <dbl[1]> <int[1]>
# 3 3 <dbl[1]> <dbl[1]> <int[1]>
If I want now to add a more complex object to the output, I have to rely on the second form:
(r3 <- d %>% group_by(w) %>%
do(s1 = lm(y ~ x, .),
s2 = sum(.$y),
s3 = {
z <- seq_along(.$x)
sum(z)
}))
# Source: local data frame [3 x 4]
# Groups: <by row>
#
# w s1 s2 s3
# (fctr) (chr) (chr) (chr)
# 1 1 <S3:lm> <dbl[1]> <int[1]>
# 2 2 <S3:lm> <dbl[1]> <int[1]>
# 3 3 <S3:lm> <dbl[1]> <int[1]>
So my question is, if there is an elegant way to combine the nice output of the unnamed form of do
(in particular that vectors are stored as vectors and not as vectors of lists) with the ability to store also more complicated objects of the named version of do
? Desired output would be something like this without the need of the extra mutate
:
r3 %>% mutate(s2 = unlist(s2), s3 = unlist(s3))
# Source: local data frame [3 x 4]
# Groups: <by row>
#
# w s1 s2 s3
# (fctr) (chr) (dbl) (int)
# 1 1 <S3:lm> 0.8447634 45
# 2 2 <S3:lm> 3.3060157 91
# 3 3 <S3:lm> -0.1675416 36
Edit
This question is apparently not valid anymore, since in my current dplyr
version I get list
instead of chr
.
Finally, why are s1
, s2
and s3
in the second example of type (chr)
?
Upvotes: 4
Views: 100
Reputation: 57686
Wrap the model in a list
, and keep R from trying to unlist it with I
.
r3 <- d %>% group_by(w) %>%
do(data.frame(s1 = I(list(lm(y ~ x, .))),
s2 = sum(.$y),
s3 = {
z <- seq_along(.$x)
sum(z)
}))
#Source: local data frame [3 x 4]
#Groups: w [3]
# w s1 s2 s3
# (fctr) (chr) (dbl) (int)
#1 1 <S3:lm> 0.8447634 45
#2 2 <S3:lm> 3.3060157 91
#3 3 <S3:lm> -0.1675416 36
(The printed type chr
is a bug in print.tbl_df
, since fixed in dplyr 0.5. Don't worry about it.)
Upvotes: 4