Reputation: 111
Lets say I train a model in R.
model <- lm(as.formula(paste((model_Data)[2],"~",paste((model_Data)[c(4,5,6,7,8,9,10,11,12,13,15,16,17,18,20,21,22,63,79,90,91,109,125,132,155,175,197,202,210,251,252,279,287,292,300,313,318)],collapse="+"),sep="")),data=model_Data)
I then use the model to predict an unknown.
prediction <- predict(model,unknown[1,])
1
8.037219
Instead of using predict lets pull out the coefficients and do it manually.
model$coefficients
9.250265284
0.054054202
0.052738367
-0.55119556
0.019686046
0.392728331
0.794558094
0.200555755
-0.63218309
0.050404541
0.089660195
-0.04889444
-0.24645514
0.225817891
-0.10411162
0.108317865
0.004281512
0.219695437
0.037514904
-0.00914805
0.077885231
0.656321472
-0.05436867
0.033296525
0.072551915
-0.11498145
-0.03414029
0.081145352
0.11187141
0.690106624
NA
-0.11112986
-0.18002883
0.006238802
0.058387332
-0.04469568
-0.02520228
0.121577926
Looks like the model couldn't find a coefficient for one of the variables.
Here are the independent variables for our unknown.
2.048475484
1.747222331
-1.240658767
-1.26971135
-0.61858754
-1.186401425
-1.196781456
-0.437969964
-1.37330171
-1.392555895
-0.147275619
0.315190159
0.544014105
-1.137999082
0.464498153
-1.825631473
-1.824991143
0.61730876
-1.311527708
-0.457725059
-0.455920549
-0.196326975
0.636723746
0.128123676
-0.0064055
-0.788435688
-0.493452602
-0.563353694
-0.441559371
-1.083489708
-0.882784077
-0.567873188
1.068504735
1.364721122
0.294178454
2.302875604
-0.998685333
If I multiply each independent variable by it's coefficient and add on the intercept the predicted value for the unknown is 8.450137349
The predict function gave us 8.037219 and the manual calculation gave 8.450137349. What is happening within the predict function that is causing it to predict a different value than the manual calculation? What has to be done to make the values match?
Upvotes: 0
Views: 211
Reputation: 21757
I get a lot closer to the predict
answer when using the code below:
b <- c(9.250265284, 0.054054202, 0.052738367, -0.55119556, 0.019686046, 0.392728331, 0.794558094, 0.200555755, -0.63218309, 0.050404541, 0.089660195, -0.04889444, -0.24645514, 0.225817891, -0.10411162, 0.108317865, 0.004281512, 0.219695437, 0.037514904, -0.00914805, 0.077885231, 0.656321472, -0.05436867, 0.033296525, 0.072551915, -0.11498145, -0.03414029, 0.081145352, 0.11187141, 0.690106624, NA, -0.11112986, -0.18002883, 0.006238802, 0.058387332, -0.04469568, -0.02520228, 0.121577926)
x <- c(1, 2.048475484, 1.747222331, -1.240658767, -1.26971135, -0.61858754, -1.186401425, -1.196781456, -0.437969964, -1.37330171, -1.392555895, -0.147275619, 0.315190159, 0.544014105, -1.137999082, 0.464498153, -1.825631473, -1.824991143, 0.61730876, -1.311527708, -0.457725059, -0.455920549, -0.196326975, 0.636723746, 0.128123676, -0.0064055, -0.788435688, -0.493452602, -0.563353694, -0.441559371, -1.083489708, -0.882784077, -0.567873188, 1.068504735, 1.364721122, 0.294178454, 2.302875604, -0.998685333)
# remove the missing value in `b` and the corresponding value in `x`
x <- x[-31]
b <- b[-31]
x %*% b
# [,1]
# [1,] 8.036963
Upvotes: 1