Reputation: 2566
I want to make get dummy variables per unique value. Idea is to turn the data frame into a multi-label target. How can I do it?
Data:
ID L2
A Firewall
A Security
B Communications
C Business
C Switches
Desired Output:
ID Firewall Security Communications Business Switches
A 1 1 0 0 0
B 0 0 1 0 0
C 0 0 0 1 1
I have tried pd.pivot_table
but it requires a column to aggregate on. I have also tried answer on this link but it sums the values rather than just turning into binary dummy columns. I would much appreciate your help. Thanks a lot!
Upvotes: 6
Views: 2620
Reputation: 3663
You can try this:
df1 = pd.read_csv("file.csv")
df2 = df1.groupby(['ID'])['L2'].apply(','.join).reset_index()
df3 = df2["L2"].str.get_dummies(",")
df = pd.concat([df2, df3], axis = 1)
print(df)
Output:
ID L2 Business Communications Firewall Security Switches
0 A Firewall,Security 0 0 1 1 0
1 B Communications 0 1 0 0 0
2 C Business,Switches 1 0 0 0 1
Alternative Option:
df = df.groupby(['ID'])['L2'].apply(','.join).str.get_dummies(",").reset_index()
print(df)
Upvotes: 2
Reputation: 29635
you can use pivot_table
if you change the aggfunc=any
.
print(df.pivot_table(index='ID', columns='L2',
aggfunc=any, fill_value=False)\
.astype(int))
L2 Business Communications Firewall Security Switches
ID
A 0 0 1 1 0
B 0 1 0 0 0
C 1 0 0 0 1
and maybe reset_index
at the end to put the ID as column
Upvotes: 1
Reputation: 323316
Let us set_index
then get_dummies
, since we have multiple duplicate in each ID ,we need to sum
with level = 0
s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]:
ID Business Communications Firewall Security Switches
0 A 0 0 1 1 0
1 B 0 1 0 0 0
2 C 1 0 0 0 1
Upvotes: 4
Reputation: 150785
crosstab
, then convert to boolean:
pd.crosstab(df['ID'],df['L2']).astype(bool)
Output:
L2 Business Communications Firewall Security Switches
ID
A False False True True False
B False True False False False
C True False False False True
Upvotes: 8