joseph pareti
joseph pareti

Reputation: 97

One-hot-encoding pandas dataframe features according to a list

My intent is to just 1-hot-encode the features in the categorical list. The problem with my code is that it keeps 1-hot-encoding the features that were already encoded in the previous iteration. How can I prevent this from occurring?

import pandas as pd
import numpy as np
data = {
    'apples': [3, 2, 0, np.nan, 2],
    'oranges': [0, 7, 7, 2, 7],
    'figs':[1, np.nan, 10, np.nan, 10],
    'key-customer':['N','Y','Y','N','N'],
    'rating':['L','L','H','L','M'],
    'frequent-cust':['Y', 'N', 'N', 'N', 'Y']
}
purchases = pd.DataFrame(data)
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David', 'Bob'])
print(purchases)
#one-hot-encode just the features in a given subset
categorical = ['rating', 'key-customer']
for item in categorical:
    d = pd.get_dummies(purchases[item], prefix=item)
    purchases = pd.concat([purchases, d], axis=1)
    purchases.drop(columns=item, inplace=True)
print(purchases)

Upvotes: 0

Views: 47

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150805

I would drop and join:

for item in categorical:
    d = pd.get_dummies(purchases[item], prefix=item)
    purchases = purchases.drop(item, axis=1).join(d)

print(purchases)

Output:

        apples  oranges  figs frequent-cust  rating_H  rating_L  rating_M  \
June       3.0        0   1.0             Y         0         1         0   
Robert     2.0        7   NaN             N         0         1         0   
Lily       0.0        7  10.0             N         1         0         0   
David      NaN        2   NaN             N         0         1         0   
Bob        2.0        7  10.0             Y         0         0         1   

        key-customer_N  key-customer_Y  
June                 1               0  
Robert               0               1  
Lily                 0               1  
David                1               0  
Bob                  1               0  

Upvotes: 1

Related Questions