lddubs
lddubs

Reputation: 11

why doesn't y_pred = X @ coef_ + intercept_ for sklearn PLSRegression?

I performed partial least squares regression using Python's sklearn.cross_decomposition.PLSRegression using the example data in the sklearn docs. I am surprised that X @ coef_ + intercept_ does not equal Y_pred. Can someone please explain?

from sklearn.cross_decomposition import PLSRegression
X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
pls2 = PLSRegression(n_components=2)
pls2.fit(X, Y)
PLSRegression()
Y_pred = pls2.predict(X)
[email protected]_ + pls2.intercept_

returns

array([[ 6.80991986,  6.88073249],
       [ 6.24687317,  6.24590503],
       [16.37620337, 16.72659034],
       [27.32746904, 28.13552828]])

but Y_pred is

array([[ 0.26087869,  0.15302213],
       [ 0.60667302,  0.45634164],
       [ 6.46856199,  6.48931562],
       [11.7638863 , 12.00132061]])

Upvotes: 1

Views: 705

Answers (1)

dx2-66
dx2-66

Reputation: 2851

PLSRegression() is hardcoded to perform mean centering on the data during fit (see https://github.com/scikit-learn/scikit-learn/issues/10605), which you currently cannot opt out of.

predict() basically does (X - pls2._x_mean)/pls2._x_std @pls2.coef_ + pls2.intercept_

Upvotes: 1

Related Questions