Reputation: 97
My intent is to just 1-hot-encode the features in the categorical
list. The problem with my code is that it keeps 1-hot-encoding the features that were already encoded in the previous iteration. How can I prevent this from occurring?
import pandas as pd
import numpy as np
data = {
'apples': [3, 2, 0, np.nan, 2],
'oranges': [0, 7, 7, 2, 7],
'figs':[1, np.nan, 10, np.nan, 10],
'key-customer':['N','Y','Y','N','N'],
'rating':['L','L','H','L','M'],
'frequent-cust':['Y', 'N', 'N', 'N', 'Y']
}
purchases = pd.DataFrame(data)
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David', 'Bob'])
print(purchases)
#one-hot-encode just the features in a given subset
categorical = ['rating', 'key-customer']
for item in categorical:
d = pd.get_dummies(purchases[item], prefix=item)
purchases = pd.concat([purchases, d], axis=1)
purchases.drop(columns=item, inplace=True)
print(purchases)
Upvotes: 0
Views: 47
Reputation: 150805
I would drop and join:
for item in categorical:
d = pd.get_dummies(purchases[item], prefix=item)
purchases = purchases.drop(item, axis=1).join(d)
print(purchases)
Output:
apples oranges figs frequent-cust rating_H rating_L rating_M \
June 3.0 0 1.0 Y 0 1 0
Robert 2.0 7 NaN N 0 1 0
Lily 0.0 7 10.0 N 1 0 0
David NaN 2 NaN N 0 1 0
Bob 2.0 7 10.0 Y 0 0 1
key-customer_N key-customer_Y
June 1 0
Robert 0 1
Lily 0 1
David 1 0
Bob 1 0
Upvotes: 1