FXQuantTrader
FXQuantTrader

Reputation: 6891

Why does R behave strangely with the lm.predict function, in the following code?

I have a problem with the code below, which I'm trying to understand:

x = rnorm(50)
y  = 3 * x +rnorm(50)

df_eq  <- data.frame(x, y)

model1  <- lm(y ~ x - 1)
model2  <- lm(df_eq[,2] ~ df_eq[,1] - 1)

xpred <- data.frame(x = seq(from = -2, to = 2, length = 5))

ypred <- predict(object = model1, newdata = xpred)
ypred2 <- predict(object = model2, newdata = xpred)

In the above code, I am expecting both ypred and ypred2 to produce the same outcomes. I get the answer i'm expecting in ypred (5 predicted "yhat" values), but ypred2 has an error, and doesn't produce what is expected.

Can anyone please explain why, in the following code, ypred2 produces an error (in R 2.15.2 at least)?

The only key difference in the code, I think, comes from the way "model1" and "model2" are produced.

My understanding is that in the predict function, newdata produces the new set of observations which we want to predict "yhat" values for, based on the model as stored in model1 and model2 objects.

What is fundamentally different about

model1 <- lm(y ~ x - 1)

and

model2 <- lm(df_eq[,2] ~ df_eq[,1] - 1) ?

More importantly, if the answer is straightforward, can somebody explain how they figured out the differences from "under the hood" of R? It would be nice to know how I could understand this kind of problem in the future. I've tried looking at the structures of the objects in the above code, but am no closer to an answer.

Thank you in advance.

Upvotes: 3

Views: 163

Answers (1)

joran
joran

Reputation: 173577

From R's perspective, you hand predict.lm the object model2. It says, "Ok, I've got a lm object here. What are the variable names?"

> formula(model2)
df_eq[, 2] ~ df_eq[, 1] - 1

Ok. The response variable is called df_eq[, 2] and the predictor variable is called df_eq[, 1]. Now, R thinks: "I'm supposed to find those variables (or at least the predictor) in xpred".

Hmmmm. Nothing in there by that name.

The actual warning is thrown by model.frame.default, I believe, while attempting to build an appropriate model frame, and in the process falls back to the original data values used to fit the model.

The correct idiom for fitting models generally (lm or otherwise) would be like this:

lm(y ~ x, data = df_eq)

Don't rely are R picking up the names of objects in your global environment. Specify a data frame with the relavent columns and use those column names in the formula!

Upvotes: 6

Related Questions