thothal
thothal

Reputation: 20329

Control output format of do

The following two do statements deliver slightly different results:

library(dplyr)
set.seed(1)
d <- data.frame(x = rnorm(30), y = rnorm(30), w = factor(sample(3, 30, TRUE)))

(r1 <- d %>% group_by(w) %>%
   do(data.frame(s1 = sum(.$x),
                 s2 = sum(.$y),
                 s3 = {
                    z <- seq_along(.$x)
                    sum(z)
                 })))
# Source: local data frame [3 x 4]
# Groups: w [3]
# 
#        w        s1         s2    s3
#   (fctr)     (dbl)      (dbl) (int)
# 1      1 0.1292572  0.8447634    45
# 2      2 0.2092895  3.3060157    91
# 3      3 2.1351984 -0.1675416    36

(r2 <- d %>% group_by(w) %>%
   do(s1 = sum(.$x),
      s2 = sum(.$y),
      s3 = {
         z <- seq_along(.$x)
         sum(z)
      }))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w       s1       s2       s3
#   (fctr)    (chr)    (chr)    (chr)
# 1      1 <dbl[1]> <dbl[1]> <int[1]>
# 2      2 <dbl[1]> <dbl[1]> <int[1]>
# 3      3 <dbl[1]> <dbl[1]> <int[1]>

If I want now to add a more complex object to the output, I have to rely on the second form:

(r3 <- d %>% group_by(w) %>%
   do(s1 = lm(y ~ x, .),
      s2 = sum(.$y),
      s3 = {
         z <- seq_along(.$x)
         sum(z)
      }))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w      s1       s2       s3
#   (fctr)   (chr)    (chr)    (chr)
# 1      1 <S3:lm> <dbl[1]> <int[1]>
# 2      2 <S3:lm> <dbl[1]> <int[1]>
# 3      3 <S3:lm> <dbl[1]> <int[1]>

So my question is, if there is an elegant way to combine the nice output of the unnamed form of do (in particular that vectors are stored as vectors and not as vectors of lists) with the ability to store also more complicated objects of the named version of do? Desired output would be something like this without the need of the extra mutate:

r3 %>% mutate(s2 = unlist(s2), s3 = unlist(s3))
# Source: local data frame [3 x 4]
# Groups: <by row>
# 
#        w      s1         s2    s3
#   (fctr)   (chr)      (dbl) (int)
# 1      1 <S3:lm>  0.8447634    45
# 2      2 <S3:lm>  3.3060157    91
# 3      3 <S3:lm> -0.1675416    36

Edit

This question is apparently not valid anymore, since in my current dplyr version I get list instead of chr.

Finally, why are s1, s2 and s3 in the second example of type (chr)?

Upvotes: 4

Views: 100

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57686

Wrap the model in a list, and keep R from trying to unlist it with I.

r3 <- d %>% group_by(w) %>%
    do(data.frame(s1 = I(list(lm(y ~ x, .))),
                  s2 = sum(.$y),
                  s3 = {
                     z <- seq_along(.$x)
                     sum(z)
                  }))

#Source: local data frame [3 x 4]
#Groups: w [3]

#       w      s1         s2    s3
#  (fctr)   (chr)      (dbl) (int)
#1      1 <S3:lm>  0.8447634    45
#2      2 <S3:lm>  3.3060157    91
#3      3 <S3:lm> -0.1675416    36

(The printed type chr is a bug in print.tbl_df, since fixed in dplyr 0.5. Don't worry about it.)

Upvotes: 4

Related Questions