Trace
Trace

Reputation: 63

Sklearn Pipeline to add new features

Say I have a dataset with a bunch of numerical features. I'm not sure what's the best way to use the numerical features in a model so I decide to apply different transformations to them and add those results to the dataset. These transformation could be MinMax Scaling, StandardScaling, LogTransform, ... whatever you can think of.

So basically, in the raw data I might only have the feature "Value_in_Dollars" and after all transformations I also want to have the transformed features in the dataset:

"Value_in_Dollars_MinMax", "Value_in_Dollars_SS", "Value_in_Dollars_Log"

in addition to the original column.

I know how to do this manually but how would I do this in a Sklearn pipeline? It this even possible?

Upvotes: 3

Views: 1460

Answers (1)

Ben Reiniger
Ben Reiniger

Reputation: 12614

Use FeatureUnion and probably ColumnTransformer, e.g.

union = FeatureUnion([("MinMax", MinMaxScaler()),
                      ("SS", StandardScaler()),
                      ("Log", FunctionTransformer(np.log1p)])
proc = ColumnTransformer([('trylots', union, ['Value_In_Dollars'])],
                         remainder='passthrough')

Upvotes: 3

Related Questions