fghoussen
fghoussen

Reputation: 565

sklearn : scaling x (data) and y (target) using both Pipeline and TransformedTargetRegressor

I'd like to use both Pipeline and TransformedTargetRegressor to handle all the scaling (on data and target) : is this possible to mix Pipeline and TransformedTargetRegressor ? How to get results out of TransformedTargetRegressor ?

$ cat test_ttr.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-

from sklearn.datasets import make_regression
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor

def main():
    x, y = make_regression()

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

    model = linear_model.Ridge(alpha=1)

    pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
    treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())

    treg.fit(x_train, y_train)

    print(pipe.get_params()['model__alpha']) # OK !
    print(treg.get_params()['regressor__model__coef']) # KO ?!

if __name__ == '__main__':
    main()

But can't get results (coefs for instance) out of TransformedTargetRegressor

1
Traceback (most recent call last):
  File ".\test_ttr.py", line 26, in <module>
    main()
  File ".\test_ttr.py", line 23, in main
    print(treg.get_params()['regressor__model__coef']) # KO ?!
TypeError: 'TransformedTargetRegressor' object is not subscriptable

Upvotes: 3

Views: 1653

Answers (2)

fghoussen
fghoussen

Reputation: 565

Best solution I found (not sure accessing members directly is great anyway):

$ cat test_ttr.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-

from sklearn.datasets import make_regression
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor

def main():
    x, y = make_regression()

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

    model = linear_model.Ridge(alpha=1)

    pipe = Pipeline([('scale', preprocessing.StandardScaler()), ('model', model)])
    treg = TransformedTargetRegressor(regressor=pipe, transformer=preprocessing.MinMaxScaler())

    treg.fit(x_train, y_train)

    print(treg.regressor_['model'].coef_)
    print(treg.regressor_['model'].alpha)

if __name__ == '__main__':
    main()


$ python test_ttr.py
[-1.13077347e-02  4.44189754e-03  2.39262548e-03  1.72868998e-02
  9.98554629e-03  4.66877821e-02 -4.25349208e-03  1.94027088e-03
  5.64007062e-05  3.08491096e-03 -3.50818087e-05 -1.11165790e-02
 -6.67893402e-03 -3.01372675e-03  3.70455557e-03  5.05148384e-03
  9.39056280e-03  5.63774373e-03 -4.07545049e-03 -5.98363493e-03
 -8.21146459e-03  1.20560099e-02  5.79147139e-03 -3.87135045e-03
  3.62289162e-03 -5.32527728e-03  1.05227189e-02 -3.32636550e-03
  2.24062002e-02  5.36611024e-03  4.42517510e-03  2.98492436e-04
 -3.48722166e-03 -8.16323005e-03 -1.74921354e-03 -2.47793718e-03
  2.00056722e-02  9.02842425e-03 -4.22978758e-03  2.37737450e-03
 -7.93388529e-03  1.22910175e-02  1.34225568e-03 -3.51697078e-03
  4.20992326e-03  4.35675123e-03 -8.07619773e-04  1.13628592e-02
  4.12219590e-03  6.92190818e-03 -2.44482599e-03 -3.12429604e-03
 -5.43930166e-03  3.27253280e-02  4.11909724e-03  3.83302056e-03
  1.34754164e-02 -8.62591922e-04 -4.14770516e-03 -7.02794996e-03
 -2.04141679e-03 -8.93807591e-04 -1.50736158e-03  3.51801088e-03
 -1.26757035e-02 -8.46096567e-04  6.70465585e-02 -1.12191639e-02
  6.08120935e-03 -9.07017386e-03 -2.13280853e-03 -2.24764380e-03
  6.98012623e-03 -9.26042982e-03 -2.93708218e-03  5.74605237e-04
 -1.41308272e-03  5.24419314e-03  3.41054848e-02  7.80090716e-03
  7.33259527e-02 -4.78241365e-03  2.38806342e-04  3.84449219e-04
  5.49127586e-02 -6.91505707e-04 -4.14642042e-04  3.43961614e-03
  5.20966922e-04 -5.47828158e-03 -7.04740862e-04  4.68760531e-02
  4.12140344e-03 -5.16221700e-03 -7.35235898e-03  7.68674585e-03
 -4.39094201e-03  5.05034775e-03  5.75523532e-03 -6.17177294e-03]
1

To stackoverflowed people, feel free to improve this answer if possible!

Upvotes: 1

Kim Tang
Kim Tang

Reputation: 2478

The error occurs in your line

print(treg.get_params()['regressor__model__coef']) # KO ?!

because TransformedTargetRegressor does not have the parameter 'regressor__model__coef'.

You can have a look at all the available parameters by executing treg.get_params() which then returns:

{'check_inverse': True,
 'func': None,
 'inverse_func': None,
 'regressor': Pipeline(memory=None,
          steps=[('scale',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ('model',
                  Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
                        normalize=False, random_state=None, solver='auto',
                        tol=0.001))],
          verbose=False),
 'regressor__memory': None,
 'regressor__model': Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
       random_state=None, solver='auto', tol=0.001),
 'regressor__model__alpha': 1,
 'regressor__model__copy_X': True,
 'regressor__model__fit_intercept': True,
 'regressor__model__max_iter': None,
 'regressor__model__normalize': False,
 'regressor__model__random_state': None,
 'regressor__model__solver': 'auto',
 'regressor__model__tol': 0.001,
 'regressor__scale': StandardScaler(copy=True, with_mean=True, with_std=True),
 'regressor__scale__copy': True,
 'regressor__scale__with_mean': True,
 'regressor__scale__with_std': True,
 'regressor__steps': [('scale',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ('model',
   Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
         random_state=None, solver='auto', tol=0.001))],
 'regressor__verbose': False,
 'transformer': MinMaxScaler(copy=True, feature_range=(0, 1)),
 'transformer__copy': True,
 'transformer__feature_range': (0, 1)}

You can get results, such as the R2 score for instance by using

treg.score(x_test, y_test)

which returns

0.7506837388137267

To predict, you can use

treg.predict(x_test)

The documentation is very useful and you can read up on it here and here.

Upvotes: 4

Related Questions