Predicted values from a series of linear models

Question

Hi there: I have a series of linear models in a data frame constructed using tidyr and dplyr. It looks like below. How would I go about generating predicted values from each model with a fixed set of newdata? In reality I have 10 dependent variables, but only two independent variables

#random data
x1<-rnorm(100, mean=10, sd=5)
x2<-rnorm(100, mean=5, sd=2 )
y1<-rnorm(100, mean=5, sd=1)
y2<-rnorm(100, mean=3, sd=1)
#create test data farame
df<-data.frame(y1, y2, x1, x2)
#create models
df%>%
  gather(dv, value, y1, y2, -x1,-x2) %>%
  group_by(dv)%>%
  do(mod=lm(value~x1+x2, data=.))

aosmith · Accepted Answer

One option would be to get the predictions as a column in a data.frame using do. The difference from the other answer is the use of data.frame to get the predictions in a column. You can add in the dv variable to this dataset to keep things straight.

df %>%
    gather(dv, value, y1, y2, -x1,-x2) %>%
    group_by(dv)%>%
    do(mod=lm(value ~ x1 + x2, data=.)) %>%
        do(data.frame(dv = .$dv, pred = predict(.$mod, newdata = df)))

Source: local data frame [200 x 2]
Groups: 

      dv     pred
   (chr)    (dbl)
1     y1 4.936012
2     y1 4.948939
3     y1 4.992472
4     y1 4.733290
5     y1 4.921581
6     y1 5.115699
7     y1 4.981135
8     y1 4.837326
9     y1 4.641484
10    y1 4.739197
..   ...      ...

The down side of that (to me) is that you don't have the data used for the predictions with the actual predicted values. You could certainly cbind to the prediction dataset, but another useful option is to use augment from package broom within do. In this second alternative I use augment within the first call to do, although it's not required.

You can give the dataset you want to predict with/add the predictions to using the newdata argument within augment. In this example I used the dataset df2 (just the dependent variable columns of your df dataset).

library(broom)
df2 = df[ , 3:4] # Dataset for predictions
df %>%
    gather(dv, value, y1, y2, -x1,-x2) %>%
    group_by(dv)%>%
    do( augment(lm(value ~ x1 + x2, data=.), newdata = df2) )

Source: local data frame [200 x 5]
Groups: dv [2]

      dv        x1       x2  .fitted   .se.fit
   (chr)     (dbl)    (dbl)    (dbl)     (dbl)
1     y1  5.863764 6.201406 4.936012 0.1521102
2     y1  4.419014 7.028888 4.948939 0.1936563
3     y1  7.917369 6.081930 4.992472 0.1255001
4     y1  4.338864 4.019565 4.733290 0.1842635
5     y1 13.307611 2.674705 4.921581 0.1757911
6     y1 14.986879 4.666154 5.115699 0.1614377
7     y1 12.941636 3.679022 4.981135 0.1409247
8     y1  7.474526 4.088868 4.837326 0.1310659
9     y1  2.136858 3.706184 4.641484 0.2357699
10    y1  9.307190 1.885127 4.739197 0.2008851
..   ...       ...      ...      ...       ...

Predicted values from a series of linear models

Answers (1)

Related Questions