Reputation: 11657
I am having trouble running predict
after running a linear regression because I cannot figure out which X variables are actually included in the linear regression.
Let's say I run the model:
model1 <- lm(outcome ~ employee + shape + size + color + I(color^2)
data = data)
The number of observations identified in the regression output is 224605.
When I try to run predict like so:
test = data.frame(y = predict(model1), x = data$employee)
Error in data.frame(y = predict(model1), x = data$employee) :
arguments imply differing number of rows: 224605, 233262
I thought I could get the correct number of observations like so:
> test = na.omit(data, cols = all.vars(model1))
> nrow(test)
[1] 207256
but this still does not yield the correct number of observations. Is there a direct way to grab the observations actually being used by linear regression?
Upvotes: 3
Views: 2696
Reputation: 145755
Missing observations are omitted by default. If a row has an NA
for any of the variables used in the model, it will be omitted. See ?lm
and the na.action
section for details.
You can run na.omit(data[c("outcome", "employee", ..."color")])
to get the data frame with the omitted variables (put all the columns in your formula into the na.omit()
. You can also pull it out of the model object, model1$model
is the data frame used for model fitting (with missing values omitted).
You may also want to look into the broom
package for tidying up your model. broom::augment
is a nice way to add predictions back to the original data.
Upvotes: 2
Reputation: 7654
Try model.frame
:
set.seed(1)
df <- data.frame(x = rnorm(10), y = rnorm(10))
df[c(3, 5), 1] <- NA
df[7, 2] <- NA
df
# x y
# 1 -0.6264538 1.51178117
# 2 0.1836433 0.38984324
# 3 NA -0.62124058
# 4 1.5952808 -2.21469989
# 5 NA 1.12493092
# 6 -0.8204684 -0.04493361
# 7 0.4874291 NA
# 8 0.7383247 0.94383621
# 9 0.5757814 0.82122120
# 10 -0.3053884 0.59390132
fit <- lm(y ~ x, df)
model.frame(fit)
# y x
# 1 1.51178117 -0.6264538
# 2 0.38984324 0.1836433
# 4 -2.21469989 1.5952808
# 6 -0.04493361 -0.8204684
# 8 0.94383621 0.7383247
# 9 0.82122120 0.5757814
# 10 0.59390132 -0.3053884
Upvotes: 7