JohnK
JohnK

Reputation: 1039

Predict function returns more values than those required

My dataset consists of 60 observations in three variables, x1 and x2 which are my predictors, and y which is my response. The problem is that the last 20 observations in y are missing so I fitted a linear regression model, which I called fit, onto the first 40 observations and now I have been trying to use the predict function to generate the missing values.

The code for the regression I used is

fit<-lm(y1a~x1a+x2a)

where y1a,x1a and x2a refer to the first 40 observations.

The code I have been using to fill in the remaining values is:

x <- data.frame(data$x1[41:60], data$x2[41:60])

predict(fit,x,interval="prediction",level=0.95)

But now the problem is that I get 40 new values for y instead of the required 20, along with the warning message:

'newdata' had 20 rows but variables found have 40 rows

Could you please tell me what I am doing wrong?

Upvotes: 0

Views: 2766

Answers (1)

Backlin
Backlin

Reputation: 14842

The column names of the data frame sent to predict must match the column names of the data frame used to create the model. If you create x as you show above the names will not be the same and predict will instead use the original data (the frame you call data).

Try this instead

fit <- lm(y ~ ., data[1:40,])
predict(fit, data[41:60,])

Upvotes: 1

Related Questions