Reputation: 608
I need to pivot long pandas dataframe to wide. The issue is that for some id there are multiple values for the same parameter. Some parameters present only in a few ids.
df = pd.DataFrame({'indx':[11,11,11,11,12,12,12,13,13,13,13],'param':['a','b','b','c','a','b','d','a','b','c','c'],'value':[100,54,65,65,789,24,98,24,27,75,35]})
indx param value
11 a 100
11 b 54
11 b 65
11 c 65
12 a 789
12 b 24
12 d 98
13 a 24
13 b 27
13 c 75
13 c 35
Want to receive something like this:
indx a b c d
11 100 `54,65` 65 None
12 789 None 98 24
13 24 27 `75,35` None
or
indx a b b1 c c1 d
11 100 54 65 65 None None
12 789 None None 98 None 24
13 24 27 None 75 35 None
So, obviously direct df.pivot()
not a solution.
Thanks for any help.
Upvotes: 0
Views: 348
Reputation: 153460
df.astype(str).groupby(['indx', 'param'])['value'].agg(','.join).unstack()
Output:
param a b c d
indx
11 100 54,65 65 NaN
12 789 24 NaN 98
13 24 27 75,35 NaN
df_out = df.set_index(['indx', 'param', df.groupby(['indx','param']).cumcount()])['value'].unstack([1,2])
df_out.columns = [f'{i}_{j}' if j != 0 else f'{i}' for i, j in df_out.columns]
df_out.reset_index()
Output:
indx a b b_1 c d c_1
0 11 100.0 54.0 65.0 65.0 NaN NaN
1 12 789.0 24.0 NaN NaN 98.0 NaN
2 13 24.0 27.0 NaN 75.0 NaN 35.0
Upvotes: 3
Reputation: 608
Ok, found a solution (there is method df.pivot_table
for such cases,allows different aggregation functions):
df.pivot_table(index='indx', columns='param',values='value', aggfunc=lambda x: ','.join(x.astype(str)) )
indx a b c d
11 100 54,65 65 NaN
12 789 24 NaN 98
13 24 27 75,35 NaN
Upvotes: 1