Reputation: 4705
Given the following data:
import pandas as pd
import io
data = pd.read_csv(
io.StringIO(
",x,t,t_cls,level\n0,70.0,4.472156319636573,med,level_0\n1,30.0,6.4012513032334235,med,level_1\n"
)
).drop("Unnamed: 0", axis=1)
I can unstack as follows :
data.set_index("level", append=True).unstack(level=1)
Which gives:
('x', 'level_0') | ('x', 'level_1') | ('t', 'level_0') | ('t', 'level_1') | ('t_cls', 'level_0') | ('t_cls', 'level_1') | |
---|---|---|---|---|---|---|
0 | 70 | nan | 4.47216 | nan | med | nan |
1 | nan | 30 | nan | 6.40125 | nan | med |
As can be seen - there are nan
values on every other col/row - I'd like to be able to have this "unstacked" to a single row rather than spread across two like it currently is.
For example:
And from there I can flatten the columns as needed.
But is very ugly:
(
data.set_index("level", append=True)
.unstack(level=1)
.apply(lambda x: sorted(x.astype(str)), axis=0)
.replace({"nan": np.nan})
.dropna(how="all")
)
Gives:
('x', 'level_0') | ('x', 'level_1') | ('t', 'level_0') | ('t', 'level_1') | ('t_cls', 'level_0') | ('t_cls', 'level_1') | |
---|---|---|---|---|---|---|
0 | 70 | 30 | 4.47216 | 6.40125 | med | med |
Upvotes: 0
Views: 65
Reputation: 214987
Create a group variable by level
first, then pivot
:
data['g'] = data.groupby('level').cumcount()
data
# x t t_cls level g
#0 70.0 4.472156 med level_0 0
#1 30.0 6.401251 med level_1 0
data.pivot('g', 'level', ['x', 't', 't_cls'])
# x t t_cls
#level level_0 level_1 level_0 level_1 level_0 level_1
#g
#0 70.0 30.0 4.472156 6.401251 med med
Upvotes: 1