Reputation: 1907
Hi there: I have a series of linear models in a data frame constructed using tidyr and dplyr. It looks like below. How would I go about generating predicted values from each model with a fixed set of newdata? In reality I have 10 dependent variables, but only two independent variables
#random data
x1<-rnorm(100, mean=10, sd=5)
x2<-rnorm(100, mean=5, sd=2 )
y1<-rnorm(100, mean=5, sd=1)
y2<-rnorm(100, mean=3, sd=1)
#create test data farame
df<-data.frame(y1, y2, x1, x2)
#create models
df%>%
gather(dv, value, y1, y2, -x1,-x2) %>%
group_by(dv)%>%
do(mod=lm(value~x1+x2, data=.))
Upvotes: 3
Views: 1019
Reputation: 36104
One option would be to get the predictions as a column in a data.frame using do
. The difference from the other answer is the use of data.frame
to get the predictions in a column. You can add in the dv
variable to this dataset to keep things straight.
df %>%
gather(dv, value, y1, y2, -x1,-x2) %>%
group_by(dv)%>%
do(mod=lm(value ~ x1 + x2, data=.)) %>%
do(data.frame(dv = .$dv, pred = predict(.$mod, newdata = df)))
Source: local data frame [200 x 2]
Groups: <by row>
dv pred
(chr) (dbl)
1 y1 4.936012
2 y1 4.948939
3 y1 4.992472
4 y1 4.733290
5 y1 4.921581
6 y1 5.115699
7 y1 4.981135
8 y1 4.837326
9 y1 4.641484
10 y1 4.739197
.. ... ...
The down side of that (to me) is that you don't have the data used for the predictions with the actual predicted values. You could certainly cbind
to the prediction dataset, but another useful option is to use augment
from package broom within do
. In this second alternative I use augment
within the first call to do
, although it's not required.
You can give the dataset you want to predict with/add the predictions to using the newdata
argument within augment
. In this example I used the dataset df2
(just the dependent variable columns of your df
dataset).
library(broom)
df2 = df[ , 3:4] # Dataset for predictions
df %>%
gather(dv, value, y1, y2, -x1,-x2) %>%
group_by(dv)%>%
do( augment(lm(value ~ x1 + x2, data=.), newdata = df2) )
Source: local data frame [200 x 5]
Groups: dv [2]
dv x1 x2 .fitted .se.fit
(chr) (dbl) (dbl) (dbl) (dbl)
1 y1 5.863764 6.201406 4.936012 0.1521102
2 y1 4.419014 7.028888 4.948939 0.1936563
3 y1 7.917369 6.081930 4.992472 0.1255001
4 y1 4.338864 4.019565 4.733290 0.1842635
5 y1 13.307611 2.674705 4.921581 0.1757911
6 y1 14.986879 4.666154 5.115699 0.1614377
7 y1 12.941636 3.679022 4.981135 0.1409247
8 y1 7.474526 4.088868 4.837326 0.1310659
9 y1 2.136858 3.706184 4.641484 0.2357699
10 y1 9.307190 1.885127 4.739197 0.2008851
.. ... ... ... ... ...
Upvotes: 3