Reputation: 71
The PLS regression using sklearn gives very poor prediction results. When I get the model I can not find the way to find the "intercept". Perhaps this affects the prediction of the model? The matrix of scores and loadings are fine. The arrangement of the coefficients also. In any case, how do I get the intercept using the attributes already obtained?
This code throws the coefficients of the variables.
from pandas import DataFrame
from sklearn.cross_decomposition import PLSRegression
X = DataFrame( {
'x1': [0.0,1.0,2.0,2.0],
'x2': [0.0,0.0,2.0,5.0],
'x3': [1.0,0.0,2.0,4.0],
}, columns = ['x1', 'x2', 'x3'] )
Y = DataFrame({
'y': [ -0.2, 1.1, 5.9, 12.3 ],
}, columns = ['y'] )
def regPLS1(X,Y):
_COMPS_ = len(X.columns) # all latent variables
model = PLSRegression(_COMPS_).fit( X, Y )
return model.coef_
The result is:
regPLS1(X,Y)
>>> array([[ 0.84], [ 2.44], [-0.46]])
In addition to these coefficients, the value of the intercept is: 0.26. What am I doing wrong?
EDIT The correct predict(evaluate) response is Y_hat (exactly the same the observed Y):
Y_hat = [-0.2 1.1 5.9 12.3]
Upvotes: 1
Views: 6424
Reputation: 46
To calculate the intercept use the following:
plsModel = PLSRegression(_COMPS_).fit( X, Y )
y_intercept = plsModel.y_mean_ - numpy.dot(plsModel.x_mean_ , plsModel.coef_)
I got the formula directly from the R "pls" package:
BInt[1,,i] <- object$Ymeans - object$Xmeans %*% B[,,i]
I tested the results and calculated the same intercepts in R 'pls' and scikit-learn.
Upvotes: 2
Reputation: 657
Based of my reading of the implementation of _PLS
the formula is Y = XB + Err
where model.coef_
is the estimate of B
. If you look at the predict
method it looks like it uses the fitted parameter y_mean_
as the Err
so I believe that's what you want. Use model.y_mean_
instead of model.coef_
. Hope this helps!
Upvotes: 2