Reputation: 565
I'd like to use both Pipeline and TransformedTargetRegressor to handle all the scaling (on data and target) on BaggingRegressor and all of its estimators.
My first try works fine (no use of Pipeline and TransformedTargetRegressor)
$ cat test1.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR
def f(x):
return x*np.cos(x) + np.random.normal(size=500)*2
def main():
# Generate random data.
x = np.linspace(0, 10, 500)
rng = np.random.RandomState(0)
rng.shuffle(x)
x = np.sort(x[:])
y = f(x)
# Plot random data.
fig, axis = plt.subplots(1, 1, figsize=(20, 10))
axis.plot(x, y, 'o', color='black', markersize=2, label='random data')
# Create bagging models.
model = BaggingRegressor(n_estimators=5, base_estimator=SVR())
x_augmented = np.array([x, x**2, x**3, x**4, x**5]).T
model.fit(x_augmented, y)
# Plot intermediate regression estimations.
axis.plot(x, model.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
for i, tree in enumerate(model.estimators_):
y_pred = tree.predict(x_augmented)
axis.plot(x, y_pred, '--', label='tree '+str(i))
axis.axis('off')
axis.legend()
plt.show()
if __name__ == '__main__':
main()
Which is OK : bagging regressor is superimposed to all estimators
Now I want to use Pipeline and TransformedTargetRegressor to handle all the scaling on data and targets but it doesn't work, as bagging scale differs from estimator scales :
$ cat test2.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR
from sklearn import preprocessing
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
def f(x):
return x*np.cos(x) + np.random.normal(size=500)*2
def main():
# Generate random data.
x = np.linspace(0, 10, 500)
rng = np.random.RandomState(0)
rng.shuffle(x)
x = np.sort(x[:])
y = f(x)
# Plot random data.
fig, axis = plt.subplots(1, 1, figsize=(20, 10))
axis.plot(x, y, 'o', color='black', markersize=2, label='random data')
# Create bagging models.
model = BaggingRegressor(n_estimators=5, base_estimator=SVR())
x_augmented = np.array([x, x**2, x**3, x**4, x**5]).T
pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())
treg.fit(x_augmented, y)
# Plot intermediate regression estimations.
axis.plot(x, treg.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
for i, tree in enumerate(treg.regressor_['model'].estimators_):
y_hat = tree.predict(x_augmented)
y_transformer = preprocessing.MinMaxScaler().fit(y.reshape(-1, 1))
y_pred = y_transformer.inverse_transform(y_hat.reshape(-1, 1))
axis.plot(x, y_pred, '--', label='tree '+str(i))
axis.axis('off')
axis.legend()
plt.show()
if __name__ == '__main__':
main()
How to handle scaling properly on bagging regressor and all of it's nested estimators ?
The diff between the 2 tests is to use Pipeline and TransformedTargetRegressor
$ diff test1.py test2.py
7a8,10
> from sklearn import preprocessing
> from sklearn.pipeline import Pipeline
> from sklearn.compose import TransformedTargetRegressor
27c30,32
< model.fit(x_augmented, y)
---
> pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
> treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())
> treg.fit(x_augmented, y)
30,32c35,39
< axis.plot(x, model.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
< for i, tree in enumerate(model.estimators_):
< y_pred = tree.predict(x_augmented)
---
> axis.plot(x, treg.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
> for i, tree in enumerate(treg.regressor_['model'].estimators_):
> y_hat = tree.predict(x_augmented)
> y_transformer = preprocessing.MinMaxScaler().fit(y.reshape(-1, 1))
> y_pred = y_transformer.inverse_transform(y_hat.reshape(-1, 1))
EDIT
Tried to use the tranformer_
member of the TransformedTargetRegressor
instance : test3 (same as test2 up to the following diff) fails too !...
$ diff test2.py test3.py
38c38
< y_transformer = preprocessing.MinMaxScaler().fit(y.reshape(-1, 1))
---
> y_transformer = treg.transformer_
Upvotes: 0
Views: 205
Reputation: 1003
I don't think there are issues with your code but rather with the plotting part.
# Plot intermediate regression estimations.
axis.plot(x, treg.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
for i, tree in enumerate(treg.regressor_['model'].estimators_):
y_hat = tree.predict(x_augmented)
y_transformer = preprocessing.MinMaxScaler().fit(y.reshape(-1, 1))
y_pred = y_transformer.inverse_transform(y_hat.reshape(-1, 1))
axis.plot(x, y_pred, '--', label='tree '+str(i))
tree
here will be an SVR()
and you are predicting on x_augmented
while in the previous part, x_augmented
was scaled with a StandardScaler
. Thus the predictions are not corresponding to what you are expecting.
So by changing the code with the following snippet, you will be fine:
# Plot intermediate regression estimations.
axis.plot(x, treg.predict(x_augmented), '-', color='red', label=model.__class__.__name__)
for i, tree in enumerate(treg.regressor_['model'].estimators_):
x_augmented_scaled = treg.regressor_.named_steps['scale'].transform(x_augmented)
y_hat = tree.predict(x_augmented_scaled)
y_transformer = preprocessing.MinMaxScaler().fit(y.reshape(-1, 1))
y_pred = y_transformer.inverse_transform(y_hat.reshape(-1, 1))
axis.plot(x, y_pred, '--', label='tree '+str(i))
axis.axis('off')
axis.legend()
plt.show()
Upvotes: 1