Eric
Eric

Reputation: 111

How to reproduce results of predict function in R

Lets say I train a model in R.

model <- lm(as.formula(paste((model_Data)[2],"~",paste((model_Data)[c(4,5,6,7,8,9,10,11,12,13,15,16,17,18,20,21,22,63,79,90,91,109,125,132,155,175,197,202,210,251,252,279,287,292,300,313,318)],collapse="+"),sep="")),data=model_Data)

I then use the model to predict an unknown.

prediction <- predict(model,unknown[1,])
1 
8.037219

Instead of using predict lets pull out the coefficients and do it manually.

model$coefficients
9.250265284
0.054054202
0.052738367
-0.55119556
0.019686046
0.392728331
0.794558094
0.200555755
-0.63218309
0.050404541
0.089660195
-0.04889444
-0.24645514
0.225817891
-0.10411162
0.108317865
0.004281512
0.219695437
0.037514904
-0.00914805
0.077885231
0.656321472
-0.05436867
0.033296525
0.072551915
-0.11498145
-0.03414029
0.081145352
0.11187141
0.690106624
NA
-0.11112986
-0.18002883
0.006238802
0.058387332
-0.04469568
-0.02520228
0.121577926

Looks like the model couldn't find a coefficient for one of the variables.

Here are the independent variables for our unknown.

2.048475484
1.747222331
-1.240658767
-1.26971135
-0.61858754
-1.186401425
-1.196781456
-0.437969964
-1.37330171
-1.392555895
-0.147275619
0.315190159
0.544014105
-1.137999082
0.464498153
-1.825631473
-1.824991143
0.61730876
-1.311527708
-0.457725059
-0.455920549
-0.196326975
0.636723746
0.128123676
-0.0064055
-0.788435688
-0.493452602
-0.563353694
-0.441559371
-1.083489708
-0.882784077
-0.567873188
1.068504735
1.364721122
0.294178454
2.302875604
-0.998685333

If I multiply each independent variable by it's coefficient and add on the intercept the predicted value for the unknown is 8.450137349

The predict function gave us 8.037219 and the manual calculation gave 8.450137349. What is happening within the predict function that is causing it to predict a different value than the manual calculation? What has to be done to make the values match?

Upvotes: 0

Views: 211

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21757

I get a lot closer to the predict answer when using the code below:

b <- c(9.250265284, 0.054054202, 0.052738367, -0.55119556, 0.019686046, 0.392728331, 0.794558094, 0.200555755, -0.63218309, 0.050404541, 0.089660195, -0.04889444, -0.24645514, 0.225817891, -0.10411162, 0.108317865, 0.004281512, 0.219695437, 0.037514904, -0.00914805, 0.077885231, 0.656321472, -0.05436867, 0.033296525, 0.072551915, -0.11498145, -0.03414029, 0.081145352, 0.11187141, 0.690106624, NA, -0.11112986, -0.18002883, 0.006238802, 0.058387332, -0.04469568, -0.02520228, 0.121577926)
x <- c(1, 2.048475484, 1.747222331, -1.240658767, -1.26971135, -0.61858754, -1.186401425, -1.196781456, -0.437969964, -1.37330171, -1.392555895, -0.147275619, 0.315190159, 0.544014105, -1.137999082, 0.464498153, -1.825631473, -1.824991143, 0.61730876, -1.311527708, -0.457725059, -0.455920549, -0.196326975, 0.636723746, 0.128123676, -0.0064055, -0.788435688, -0.493452602, -0.563353694, -0.441559371, -1.083489708, -0.882784077, -0.567873188, 1.068504735, 1.364721122, 0.294178454, 2.302875604, -0.998685333)

# remove the missing value in `b` and the corresponding value in `x`
x <- x[-31]
b <- b[-31]
x %*% b
#          [,1]
# [1,] 8.036963

Upvotes: 1

Related Questions