Reputation: 166
I have a dataframe:
df1 = pd.DataFrame(data = {'x_axis': ['p1','p2','p2','p4','p4'],
'y_axis': [1,2,3,1,5],
'error_type_A': ['error 2', 'error 1', 'error 1', 'error 2', 'error 1'],
'error_type_B': ['error 3','error 4','','error 3','error 4']})
what I want is new dataframe like below:
How can I do that?
Upvotes: 0
Views: 76
Reputation: 20689
First set y_axis
and x_axis
as index using df.set_index
. Then use df.agg
, the unstack using df.unstack
df1.set_index(['y_axis', 'x_axis']).agg(", ".join, axis=1).unstack(fill_value='')
x_axis p1 p2 p4
y_axis
1 error 2, error 3 error 2, error 3
2 error 1, error 4
3 error 1
5 error 1, error 4
Upvotes: 6
Reputation: 863611
Use DataFrame.melt
with remove possible missing values and reshape by DataFrame.pivot_table
, then add missing columns and index values by DataFrame.reindex
and last remove index and columns names by DataFrame.rename_axis
:
df1 = df1.replace('', np.nan)
cols = [f'p{x}' for x in range(1,6)]
idx = range(1,6)
df1 = (df1.melt(['x_axis','y_axis'])
.dropna()
.pivot_table(index='y_axis',
columns='x_axis',
values='value',
aggfunc=','.join,
fill_value='')
.reindex(columns=cols,index=idx, fill_value='')
.rename_axis(index=None, columns=None))
print (df1)
p1 p2 p3 p4 p5
1 error 2,error 3 error 2,error 3
2 error 1,error 4
3 error 1
4
5 error 1,error 4
Upvotes: 2