bo_
bo_

Reputation: 81

Question on ColumnTransformer OneHotEncoder VS mode_onehot_pipe

I would like to ask what's the different between OneHotEncoder and mode_onehot_pipe

mode_onehot_pipe = Pipeline([
    ('encoder', SimpleImputer(strategy = 'most_frequent')),
    ('one hot encoder', OneHotEncoder(handle_unknown = 'ignore'))])

transformer = ColumnTransformer([
('one hot', OneHotEncoder(handle_unknown = 'ignore'), ['Gender', 'Age', 'Working_Status', 'Annual_Income', 'Visit_Duration', 'Spending_Time', 'Outlet_Location', 'Member_Card', 'Average_Spending']),
('mode_onehot_pipe', mode_onehot_pipe, ['Visit_Plan'])], remainder = 'passthrough')

Thanks a lot!

Upvotes: 0

Views: 52

Answers (1)

Antoine Dubuis
Antoine Dubuis

Reputation: 5324

The main difference between the two is the way they handle nan values.

mode_onehot_pipe will replace nan by the most frequent value according to the SimpleImputer configuration while OneHotEncoder will create a category for nan values.

If you pass the same feature, you will end up with one extra feature for the OneHotEncoder which will represents the nan values.

Upvotes: 1

Related Questions