Reputation: 63
Say I have a dataset with a bunch of numerical features. I'm not sure what's the best way to use the numerical features in a model so I decide to apply different transformations to them and add those results to the dataset. These transformation could be MinMax Scaling, StandardScaling, LogTransform, ... whatever you can think of.
So basically, in the raw data I might only have the feature "Value_in_Dollars" and after all transformations I also want to have the transformed features in the dataset:
"Value_in_Dollars_MinMax", "Value_in_Dollars_SS", "Value_in_Dollars_Log"
in addition to the original column.
I know how to do this manually but how would I do this in a Sklearn pipeline? It this even possible?
Upvotes: 3
Views: 1460
Reputation: 12614
Use FeatureUnion
and probably ColumnTransformer
, e.g.
union = FeatureUnion([("MinMax", MinMaxScaler()),
("SS", StandardScaler()),
("Log", FunctionTransformer(np.log1p)])
proc = ColumnTransformer([('trylots', union, ['Value_In_Dollars'])],
remainder='passthrough')
Upvotes: 3