Reputation: 223
How can I apply multiple transformers to a single pandas DataFrame column using the ColumnTransformer API?
For example, I want to take the cubic root and then standardize the values in a DataFrame column:
df = pd.DataFrame(
np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]),
columns=['a', 'b', 'c']
)
transformer = ColumnTransformer(
[
('root3_std', StandardScaler() + FunctionTransformer(np.cbrt), 'a') <-- pseudocode
],
remainder='passthrough'
)
If I write
transformer = ColumnTransformer(
[
('root3', FunctionTransformer(np.cbrt), 'a'),
('standardize', StandardScaler(), 'a')
],
remainder='passthrough'
)
I get two separate columns, one with the cubic roots and another with the standardized original values. How can I apply both transformers in one go?
Upvotes: 1
Views: 1468
Reputation: 2868
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer, StandardScaler
df = pd.DataFrame(
np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]),
columns=['a', 'b', 'c']
)
pipe = Pipeline([('function_transformer', FunctionTransformer(np.cbrt)),
('standard_scalar', StandardScaler())])
pipe.fit_transform(df[['a']])
#op
array([[-1.32381804],
[ 0.23106179],
[ 1.09275626]])
Upvotes: 1