Reputation: 4278
I'm trying to transform the nl
column into 6 columns, that is transform this :
id nl
A 3
B 1
B 5
C 2
C 3
Into this :
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A 0 0 1 0 0 0
B 1 0 0 0 1 0
C 0 1 1 0 0 0
With this,
import pandas as pd
pd.get_dummies(df['id'], prefix = 'nl')
df['id'].join(dummies)
I've managed to get the following :
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A 0 0 1 0 0 0
B 1 0 0 0 0 0
B 0 0 0 0 1 0
C 0 1 0 0 0 0
C 0 0 1 0 0 0
How do I jump the last step to get what I want ?
Thanks
Upvotes: 1
Views: 2045
Reputation: 863166
I think you need groupby
with aggregating max
:
df1 = df.groupby('id', as_index=False).max()
print (df1)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 1 0
2 C 0 1 1 0 0 0
All together - reindex
was added for missing codes, maybe in real data is not necessary:
print (df)
id nl
0 A 3
1 B 1
2 B 5
3 C 2
4 C 3
dummies = pd.get_dummies(df['nl'], prefix = 'nl')
cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']
dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 0 0
2 B 0 0 0 0 1 0
3 C 0 1 0 0 0 0
4 C 0 0 1 0 0 0
df1 = df.groupby('id', as_index=False).max()
print (df1)
id nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
0 A 0 0 1 0 0 0
1 B 1 0 0 0 1 0
2 C 0 1 1 0 0 0
Upvotes: 2