Arslán
Arslán

Reputation: 1773

Data Transformation in Python

I have the following DataFrame:

ID    MONTHLY_QTY
H1    M1
H1    M2
H1    M3
H1    M4
H2    M1
H2    M4

I need to transform it to something like this:

ID    col1  col2  col3  col4
H1    M1    M2    M3    M4
H2    M1    M2

The number of distinct values of MONTHLY_QTY are limited i.e M1 - M4.

Is there a way I can achieve this in Python ?

Upvotes: 1

Views: 261

Answers (2)

Merlin
Merlin

Reputation: 25639

Starting with this df:

   ID MONTHLY_QTY
0  H1          M1
1  H1          M2
2  H1          M3
3  H1          M4
4  H2          M1
5  H2          M4

dummies = pd.get_dummies(df["MONTHLY_QTY"])
df2     = df.join(dummies)
df2.groupby(['ID' ] )['M1','M2', "M3", "M4" ].sum()

    M1   M2   M3   M4
ID                    
H1  1.0  1.0  1.0  1.0
H2  1.0  0.0  0.0  1.0

or

df2.rename(columns={'M1': 'col1', 'M2': 'col2', 'M3': 'col3','M4': 'col4', })


   col1  col2  col3  col4
ID                        
H1   1.0   1.0   1.0   1.0
H2   1.0   0.0   0.0   1.0

I couldnt figure out how to replace 1 with column header values.

Upvotes: 1

jezrael
jezrael

Reputation: 862661

You can use cumcount with pivot and if need remove NaN fillna:

df['g'] = 'col' + (df.groupby('ID').cumcount() + 1).astype(str)
df = df.pivot(index='ID', columns='g', values='MONTHLY_QTY')
df.fillna('', inplace=True)
print (df)
g  col1 col2 col3 col4
ID                    
H1   M1   M2   M3   M4
H2   M1   M4   

Upvotes: 1

Related Questions