user9431057
user9431057

Reputation: 1253

Transpose DF columns based on column values - Pandas

My df looks like this,

        param       per     per_date    per_num             
    0   XYZ         1.0     2018-10-01  11.0                
    1   XYZ         2.0     2017-08-01  15.25               
    2   XYZ         1.0     2019-10-01  11.25               
    3   XYZ         2.0     2019-08-01  15.71               
    4   XYZ         3.0     2020-10-01  11.50
    5   XYZ         NaN     NaN         NaN               
    6   MMG         1.0     2021-10-01  11.75               
    7   MMG         2.0     2014-01-01  14.00
    8   MMG         3.0     2021-10-01  12.50               
    9   MMG         1.0     2014-01-01  15.00
    10  LKG         NaN     NaN         NaN               
    11  LKG         NaN     NaN         NaN

I need my output like this,

  param per_1 per_date_1 per_num_1 per_2 per_date_2 per_num_2 per_3 per_date_3 per_num_3
0 XYZ   1     2018-10-01 11.0      2     2017-08-01 15.25     NaN   NaN        NaN        
1 XYZ   1     2019-10-01 11.25     2     2019-08-01 15.71     3     2020-10-01 11.50
2 XYZ   NaN   NaN        NaN       NaN   NaN        NaN       NaN   NaN        NaN
4 MMG   1     2021-10-01 11.75     2     2014-01-01 14.00     3     2021-10-01 12.50
5 MMG   1     2014-01-01 15.00     NaN   NaN        NaN       NaN   NaN        NaN
6 LKG   NaN   NaN        NaN       NaN   NaN        NaN       NaN   NaN        NaN

If you see param column has values that are repeating and transposed column names are created from these values. Also, a new records gets created as soon as param values starts with 1. How can I achieve this?

Upvotes: 1

Views: 251

Answers (1)

jezrael
jezrael

Reputation: 862406

Here main problem are NaNs in last LKG group - first replace missing values by counter created by cumcount and assign to new column per1:

s = df['per'].isna().groupby(df['param']).cumsum()
df = df.assign(per1=df['per'].fillna(s).astype(int))
print (df)
   param  per    per_date  per_num  per1
0    XYZ  1.0  2018-10-01    11.00     1
1    XYZ  2.0  2017-08-01    15.25     2
2    XYZ  1.0  2019-10-01    11.25     1
3    XYZ  2.0  2019-08-01    15.71     2
4    XYZ  3.0  2020-10-01    11.50     3
5    XYZ  NaN         NaN      NaN     1
6    MMG  1.0  2021-10-01    11.75     1
7    MMG  2.0  2014-01-01    14.00     2
8    MMG  3.0  2021-10-01    12.50     3
9    MMG  1.0  2014-01-01    15.00     1
10   LKG  NaN         NaN      NaN     1
11   LKG  NaN         NaN      NaN     2

Then create MultiIndex with groups with compare by 1 and cumulative sum and reshape by unstack:

g = df['per1'].eq(1).cumsum()
df = df.set_index(['param', 'per1',g]).unstack(1).sort_index(axis=1, level=1)
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index(level=1, drop=True).reset_index()
print (df)
  param  per_1  per_date_1  per_num_1  per_2  per_date_2  per_num_2  per_3  \
0   LKG    NaN         NaN        NaN    NaN         NaN        NaN    NaN   
1   MMG    1.0  2021-10-01      11.75    2.0  2014-01-01      14.00    3.0   
2   MMG    1.0  2014-01-01      15.00    NaN         NaN        NaN    NaN   
3   XYZ    1.0  2018-10-01      11.00    2.0  2017-08-01      15.25    NaN   
4   XYZ    1.0  2019-10-01      11.25    2.0  2019-08-01      15.71    3.0   
5   XYZ    NaN         NaN        NaN    NaN         NaN        NaN    NaN   

   per_date_3  per_num_3  
0         NaN        NaN  
1  2021-10-01       12.5  
2         NaN        NaN  
3         NaN        NaN  
4  2020-10-01       11.5  
5         NaN        NaN  

Upvotes: 1

Related Questions