Reputation: 7404
I've got a dataframe here
Row dist1 dist2 dist3 variable value Smallest Group
0 0 40 101 dist2 40 Smallest SmallestGroup
1 0 40 101 dist3 101 SecondSmallest SecondSmallestGroup
2 1 30 100 dist2 30 Smallest SmallestGroup
3 1 30 100 dist3 100 SecondSmallest SecondSmallestGroup
4 2 30 20 98 dist2 20 Smallest SmallestGroup
5 2 30 20 98 dist1 30 SecondSmallest SecondSmallestGroup
6 3 20 15 72 dist2 15 Smallest SmallestGroup
7 3 20 15 72 dist1 20 SecondSmallest SecondSmallestGroup
8 4 15 16 11 dist3 11 Smallest SmallestGroup
9 4 15 16 11 dist1 15 SecondSmallest SecondSmallestGroup
I'd like to pivot the data so that I have four new columns: Smallest
and SecondSmallest
(which contains the values from the value
column) and SmallestGroup
and SecondSmallestGroup
(which contains the valus from the variable
column).
My desired output is as follows:
Row dist1 dist2 dist3 Smallest SecondSmallest SmallestGroup SecondSmallestGroup
0 NaN 40 101 40 101 dist2 dist3
1 NaN 30 100 30 100 dist2 dist3
I'm certain I can achieve this with a pivot, but am unsure about how to structure the command.
Upvotes: 1
Views: 75
Reputation: 1119
df[['Row','dist1','dist2','dist3']].groupby('Row').first().join(
df.pivot_table(values = 'value',index= 'Row',columns = ['Smallest'],aggfunc='first')).join(
df.pivot_table(values = 'variable',index= 'Row',columns = ['Group'],aggfunc='first'))
Output:
dist1 dist2 dist3 SecondSmallest Smallest SecondSmallestGroup \
Row
0 NaN 40 101 101 40 dist3
1 NaN 30 100 100 30 dist3
2 30.0 20 98 30 20 dist1
3 20.0 15 72 20 15 dist1
4 15.0 16 11 15 11 dist1
SmallestGroup
Row
0 dist2
1 dist2
2 dist2
3 dist2
4 dist3
Upvotes: 0
Reputation: 153460
Since, I don't think the raw data specifies that 'Group' and 'variable' must be asssociated and 'Smallest' and 'value' together, then we have to do this two steps and concatenate.
Let's try this:
(pd.concat([df.set_index(['Row','dist1','dist2','dist3','Group'])['variable'].unstack(),
df.set_index(['Row','dist1','dist2','dist3','Smallest'])['value'].unstack()],
axis=1)
.reset_index())
Output:
Row dist1 dist2 dist3 SecondSmallestGroup SmallestGroup SecondSmallest Smallest
0 0 NaN 40 101 dist3 dist2 101 40
1 1 NaN 30 100 dist3 dist2 100 30
2 2 30.0 20 98 dist1 dist2 30 20
3 3 20.0 15 72 dist1 dist2 20 15
4 4 15.0 16 11 dist1 dist3 15 11
Upvotes: 2