Reputation: 1773
I have the following DataFrame:
ID MONTHLY_QTY
H1 M1
H1 M2
H1 M3
H1 M4
H2 M1
H2 M4
I need to transform it to something like this:
ID col1 col2 col3 col4
H1 M1 M2 M3 M4
H2 M1 M2
The number of distinct values of MONTHLY_QTY are limited i.e M1 - M4.
Is there a way I can achieve this in Python ?
Upvotes: 1
Views: 261
Reputation: 25639
Starting with this df
:
ID MONTHLY_QTY
0 H1 M1
1 H1 M2
2 H1 M3
3 H1 M4
4 H2 M1
5 H2 M4
dummies = pd.get_dummies(df["MONTHLY_QTY"])
df2 = df.join(dummies)
df2.groupby(['ID' ] )['M1','M2', "M3", "M4" ].sum()
M1 M2 M3 M4
ID
H1 1.0 1.0 1.0 1.0
H2 1.0 0.0 0.0 1.0
or
df2.rename(columns={'M1': 'col1', 'M2': 'col2', 'M3': 'col3','M4': 'col4', })
col1 col2 col3 col4
ID
H1 1.0 1.0 1.0 1.0
H2 1.0 0.0 0.0 1.0
I couldnt figure out how to replace 1 with column header values.
Upvotes: 1
Reputation: 862661
You can use cumcount
with pivot
and if need remove NaN
fillna
:
df['g'] = 'col' + (df.groupby('ID').cumcount() + 1).astype(str)
df = df.pivot(index='ID', columns='g', values='MONTHLY_QTY')
df.fillna('', inplace=True)
print (df)
g col1 col2 col3 col4
ID
H1 M1 M2 M3 M4
H2 M1 M4
Upvotes: 1