erc
erc

Reputation: 10131

Predicting values with dplyr and augment

I'd like to fit models to a grouped data frame and then predict one new value per model (i.e. group).

library(dplyr)
library(broom)

data(iris)
dat <- rbind(iris, iris) 
dat$Group <- rep(c("A", "B"), each = 150)

new.dat <- data.frame(Group = rep(c("A", "B"), each = 3),
                      Species = rep(c("setosa", "versicolor", "virginica"), times = 2),
                      Sepal.Width = 1:6)
> new.dat
  Group    Species val
1     A     setosa   1
2     A versicolor   2
3     A  virginica   3
4     B     setosa   4
5     B versicolor   5
6     B  virginica   6

However, augment returns 36 rows, as if each new value is fit with each model. How can I preserve the grouping here and get one fitted value per group?

dat %>%
  group_by(Species, Group) %>%
  do(augment(lm(Sepal.Length ~ Sepal.Width, data = .), newdata = new.dat))

# A tibble: 36 x 5
# Groups:   Species, Group [6]
   Group Species    Sepal.Width .fitted .se.fit
   <fct> <fct>            <int>   <dbl>   <dbl>
 1 A     setosa               1    3.33  0.221 
 2 A     versicolor           2    4.02  0.133 
 3 A     virginica            3    4.71  0.0512
 4 B     setosa               4    5.40  0.0615
 5 B     versicolor           5    6.09  0.145 
 6 B     virginica            6    6.78  0.234 
 7 A     setosa               1    3.33  0.221 
 8 A     versicolor           2    4.02  0.133 
 9 A     virginica            3    4.71  0.0512
10 B     setosa               4    5.40  0.0615
# ... with 26 more rows

(Note that due to the example data the rows are actually duplicates, which is however not the case with my original data).

Upvotes: 1

Views: 879

Answers (1)

d125q
d125q

Reputation: 1666

You need to make the Species and Group of new.dat match those of the group currently being processed in do. You can do this like so:

group.cols <- c("Species", "Group")
dat %>%
    group_by(!!! group.cols) %>%
    do(augment(lm(Sepal.Length ~ Sepal.Width, data = .),
               newdata = semi_join(new.dat, ., by = group.cols)))

Upvotes: 1

Related Questions