Pandas: Reshaping Long Data to Wide with duplicated columns

Question

I need to pivot long pandas dataframe to wide. The issue is that for some id there are multiple values for the same parameter. Some parameters present only in a few ids.

df = pd.DataFrame({'indx':[11,11,11,11,12,12,12,13,13,13,13],'param':['a','b','b','c','a','b','d','a','b','c','c'],'value':[100,54,65,65,789,24,98,24,27,75,35]})

indx param  value
11  a   100
11  b   54
11  b   65
11  c   65
12  a   789
12  b   24
12  d   98
13  a   24
13  b   27
13  c   75
13  c   35

Want to receive something like this:

indx  a    b       c      d
11    100 `54,65`  65     None
12    789  None    98     24
13    24   27     `75,35` None

or

indx a   b    b1    c   c1   d
11  100  54   65    65  None None
12  789  None None 98  None 24
13  24   27   None 75  35    None

So, obviously direct df.pivot() not a solution.
Thanks for any help.

Scott Boston · Accepted Answer

Option 1:

df.astype(str).groupby(['indx', 'param'])['value'].agg(','.join).unstack()

Output:

param    a      b      c    d
indx                         
11     100  54,65     65  NaN
12     789     24    NaN   98
13      24     27  75,35  NaN

Option 2

df_out = df.set_index(['indx', 'param', df.groupby(['indx','param']).cumcount()])['value'].unstack([1,2])
df_out.columns = [f'{i}_{j}' if j != 0 else f'{i}' for i, j in df_out.columns]
df_out.reset_index()

Output:

   indx      a     b   b_1     c     d   c_1
0    11  100.0  54.0  65.0  65.0   NaN   NaN
1    12  789.0  24.0   NaN   NaN  98.0   NaN
2    13   24.0  27.0   NaN  75.0   NaN  35.0

Pandas: Reshaping Long Data to Wide with duplicated columns

Answers (2)

Option 1:

Option 2

Related Questions