Reputation: 133
I am trying to transform this DataFrame:
>>> df
D1 D2
S1 S2 S1 S2
0 -4 1 -6
1 -5 2 -7
3 -8 3 -9
to something like:
>>> df
0 1 2 3
D1 -4 -5 NaN -8
D2 NaN -6 -7 -9
Essentially the values in columns (D1,S1)
and (D2,S1)
need to be merged into column names and the values in columns (D1,S2)
and (D2,S2)
become the only data values.
How could I achieve that simply and efficiently ?
Upvotes: 2
Views: 1305
Reputation: 879501
The desired DataFrame have the D
s in the index. stack
moves column level values into the index. So it is natural to think about calling stack
:
In [43]: result = df.stack(level=0); result
Out[43]:
S1 S2
0 D1 0 -4
D2 1 -6
1 D1 1 -5
D2 2 -7
2 D1 3 -8
D2 3 -9
The desired DataFrame only has D
s in the index, so let's drop the outer level values:
In [44]: result.index = result.index.droplevel(0); result
Out[44]:
S1 S2
D1 0 -4
D2 1 -6
D1 1 -5
D2 2 -7
D1 3 -8
D2 3 -9
We now want the S1
column to become a column level value. That can be done by first moving the S1
column into the index,
In [45]: result = result.set_index('S1', append=True); result
Out[45]:
S2
S1
D1 0 -4
D2 1 -6
D1 1 -5
D2 2 -7
D1 3 -8
D2 3 -9
and then calling unstack
to move the index level to a column level:
In [46]: result = result.unstack('S1'); result
Out[46]:
S2
S1 0 1 2 3
D1 -4 -5 NaN -8
D2 NaN -6 -7 -9
Putting it all together,
import pandas as pd
df = pd.DataFrame({('D1', 'S1'): [0, 1, 3],
('D1', 'S2'): [-4, -5, -8],
('D2', 'S1'): [1, 2, 3],
('D2', 'S2'): [-6, -7, -9]})
result = df.stack(level=0)
result.index = result.index.droplevel(0)
result = result.set_index('S1', append=True)
result = result.unstack('S1')
result.columns = result.columns.droplevel(0)
print(result)
yields
S1 0 1 2 3
D1 -4 -5 NaN -8
D2 NaN -6 -7 -9
For reference: The four fundamental DataFrame reshaping operations are
set_index
,reset_index
, stack
and unstack
.They are fundamental because a DataFrame is just an array with index and column levels. These operators allow you to move values from the array to index levels to column levels and back. Combining of these 4 operations in various ways can solve a lot of DataFrame reshaping problems.
Upvotes: 3