Reputation: 95
I am currently trying to run the following code:
pv_model <- glm(SalePrice ~ MSSubClass + MSZoning..., data = train)
summary(pv_model)
pv_predict <- predict(pv_model)
train$PV <- pv_predict
However, when I try to assign the predictions as a column in the train data set, I get this error:
Error: Assigned data `predict(pv_model)` must be compatible with existing data.
x Existing data has 730 rows.
x Assigned data has 540 rows.
i Only vectors of size 1 are recycled.
Upon further inspection, it looks like my pv_predict variable only contains 540 rows, despite pv_model having 730. What accounts for this difference? Why does the predict function eliminate so many rows, and what can I do to fix this?
Any help is appreciated.
Upvotes: 1
Views: 931
Reputation: 5336
Missing data in the training set might be the issue. Try:
predict(pv_model, newdata=train)
This will use all the rows, and give you NA
where there is missing data in a predictor.
Upvotes: 1