Reputation: 1039
My dataset consists of 60 observations in three variables, x1
and x2
which are my predictors, and y
which is my response. The problem is that the last 20 observations in y
are missing so I fitted a linear regression model, which I called fit
, onto the first 40 observations and now I have been trying to use the predict
function to generate the missing values.
The code for the regression I used is
fit<-lm(y1a~x1a+x2a)
where y1a,x1a and x2a refer to the first 40 observations.
The code I have been using to fill in the remaining values is:
x <- data.frame(data$x1[41:60], data$x2[41:60])
predict(fit,x,interval="prediction",level=0.95)
But now the problem is that I get 40 new values for y
instead of the required 20, along with the warning message:
'newdata' had 20 rows but variables found have 40 rows
Could you please tell me what I am doing wrong?
Upvotes: 0
Views: 2766
Reputation: 14842
The column names of the data frame sent to predict
must match the column names of the data frame used to create the model. If you create x
as you show above the names will not be the same and predict
will instead use the original data (the frame you call data
).
Try this instead
fit <- lm(y ~ ., data[1:40,])
predict(fit, data[41:60,])
Upvotes: 1