Reputation: 374
I have some data which looks as follows:
Owner Label1 Label2 Label3
Bob Dog N/A N/A
John Cat Mouse N/A
Lee Dog Cat N/A
Jane Hamster Rat Ferret
And I want it reshaped to one-hot encoding. Something like this:
Owner Dog Cat Mouse Hamster Rat Ferret
Bob 1 0 0 0 0 0
John 0 1 1 0 0 0
Lee 1 1 0 0 0 0
Jane 0 0 0 1 1 1
I've looked around the documentation and stackoverflow, but haven't been able to determine the relevant functions to achieve this. get_dummies comes pretty close, but it creates a prefix for each category only when that category appears in a respective column.
Upvotes: 2
Views: 397
Reputation: 1616
The pandas.get_dummies
function converts categorical variable into dummy/indicator variables in a single step
Upvotes: 0
Reputation: 294576
sklearn.preprocessing.MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
o, l = zip(*[[o, [*filter(pd.notna, l)]] for o, *l in zip(*map(df.get, df))])
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
o = df.Owner
l = [[x for x in l if pd.notna(x)] for l in df.filter(like='Label').values]
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
Upvotes: 2
Reputation: 323396
Using
df.set_index('Owner').stack().str.get_dummies().sum(level=0)
Out[535]:
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
Or
s=df.melt('Owner')
pd.crosstab(s.Owner,s.value)
Out[540]:
value Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
Jane 0 0 1 1 0 1
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Upvotes: 4
Reputation: 51425
You could use get_dummies
on the stacked dataset, then groupby and sum:
pd.get_dummies(df.set_index('Owner').stack()).groupby('Owner').sum()
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
Upvotes: 3