Reputation: 81
I would like to ask what's the different between OneHotEncoder
and mode_onehot_pipe
mode_onehot_pipe = Pipeline([
('encoder', SimpleImputer(strategy = 'most_frequent')),
('one hot encoder', OneHotEncoder(handle_unknown = 'ignore'))])
transformer = ColumnTransformer([
('one hot', OneHotEncoder(handle_unknown = 'ignore'), ['Gender', 'Age', 'Working_Status', 'Annual_Income', 'Visit_Duration', 'Spending_Time', 'Outlet_Location', 'Member_Card', 'Average_Spending']),
('mode_onehot_pipe', mode_onehot_pipe, ['Visit_Plan'])], remainder = 'passthrough')
Thanks a lot!
Upvotes: 0
Views: 52
Reputation: 5324
The main difference between the two is the way they handle nan
values.
mode_onehot_pipe
will replace nan
by the most frequent value according to the SimpleImputer
configuration while OneHotEncoder
will create a category for nan
values.
If you pass the same feature, you will end up with one extra feature for the OneHotEncoder
which will represents the nan
values.
Upvotes: 1