François M.
François M.

Reputation: 4278

Dummification in python

I'm trying to transform the nl column into 6 columns, that is transform this :

id  nl
A   3
B   1
B   5
C   2
C   3

Into this :

id   nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A    0    0    1    0    0    0
B    1    0    0    0    1    0
C    0    1    1    0    0    0

With this,

import pandas as pd
pd.get_dummies(df['id'], prefix = 'nl')
df['id'].join(dummies)

I've managed to get the following :

id   nl_1 nl_2 nl_3 nl_4 nl_5 nl_6
A    0    0    1    0    0    0
B    1    0    0    0    0    0
B    0    0    0    0    1    0
C    0    1    0    0    0    0
C    0    0    1    0    0    0

How do I jump the last step to get what I want ?

Thanks

Upvotes: 1

Views: 2045

Answers (1)

jezrael
jezrael

Reputation: 863166

I think you need groupby with aggregating max:

df1 = df.groupby('id', as_index=False).max()
print (df1)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     1     0
2  C     0     1     1     0     0     0

All together - reindex was added for missing codes, maybe in real data is not necessary:

print (df)
  id  nl
0  A   3
1  B   1
2  B   5
3  C   2
4  C   3

dummies = pd.get_dummies(df['nl'], prefix = 'nl')

cols =['nl_' + str(x) for x in range(1, 7)]
print (cols)
['nl_1', 'nl_2', 'nl_3', 'nl_4', 'nl_5', 'nl_6']

dummies = dummies.reindex(columns = cols, fill_value=0)
df = pd.concat([df.id, dummies], axis=1)
print (df)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     0     0
2  B     0     0     0     0     1     0
3  C     0     1     0     0     0     0
4  C     0     0     1     0     0     0

df1 = df.groupby('id', as_index=False).max()
print (df1)
  id  nl_1  nl_2  nl_3  nl_4  nl_5  nl_6
0  A     0     0     1     0     0     0
1  B     1     0     0     0     1     0
2  C     0     1     1     0     0     0

Upvotes: 2

Related Questions