user2186862
user2186862

Reputation: 223

sklearn: chaining multiple transformers with ColumnTransformer

How can I apply multiple transformers to a single pandas DataFrame column using the ColumnTransformer API?

For example, I want to take the cubic root and then standardize the values in a DataFrame column:

df = pd.DataFrame(
  np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]),
  columns=['a', 'b', 'c']
)
transformer = ColumnTransformer(
  [
    ('root3_std', StandardScaler() + FunctionTransformer(np.cbrt), 'a') <-- pseudocode
  ],
  remainder='passthrough'
)

If I write

transformer = ColumnTransformer(
  [
    ('root3', FunctionTransformer(np.cbrt), 'a'),
    ('standardize', StandardScaler(), 'a')
  ],
  remainder='passthrough'
)

I get two separate columns, one with the cubic roots and another with the standardized original values. How can I apply both transformers in one go?

Upvotes: 1

Views: 1468

Answers (1)

qaiser
qaiser

Reputation: 2868

from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer, StandardScaler

df = pd.DataFrame(
np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]),
columns=['a', 'b', 'c']
)


pipe = Pipeline([('function_transformer', FunctionTransformer(np.cbrt)), 
                 ('standard_scalar', StandardScaler())])

pipe.fit_transform(df[['a']])

#op
array([[-1.32381804],
   [ 0.23106179],
   [ 1.09275626]])

Upvotes: 1

Related Questions