baxx
baxx

Reputation: 4705

Unstack pandas index without NaN values

Given the following data:

import pandas as pd
import io 


data = pd.read_csv(
    io.StringIO(
        ",x,t,t_cls,level\n0,70.0,4.472156319636573,med,level_0\n1,30.0,6.4012513032334235,med,level_1\n"
    )
).drop("Unnamed: 0", axis=1)

I can unstack as follows :

data.set_index("level", append=True).unstack(level=1)

Which gives:

('x', 'level_0') ('x', 'level_1') ('t', 'level_0') ('t', 'level_1') ('t_cls', 'level_0') ('t_cls', 'level_1')
0 70 nan 4.47216 nan med nan
1 nan 30 nan 6.40125 nan med

As can be seen - there are nan values on every other col/row - I'd like to be able to have this "unstacked" to a single row rather than spread across two like it currently is.

For example:

enter image description here

And from there I can flatten the columns as needed.

Edit - this sort of works

But is very ugly:

(
    data.set_index("level", append=True)
    .unstack(level=1)
    .apply(lambda x: sorted(x.astype(str)), axis=0)
    .replace({"nan": np.nan})
    .dropna(how="all")
)

Gives:

('x', 'level_0') ('x', 'level_1') ('t', 'level_0') ('t', 'level_1') ('t_cls', 'level_0') ('t_cls', 'level_1')
0 70 30 4.47216 6.40125 med med

Upvotes: 0

Views: 65

Answers (1)

akuiper
akuiper

Reputation: 214987

Create a group variable by level first, then pivot:

data['g'] = data.groupby('level').cumcount()
data
#      x         t t_cls    level  g
#0  70.0  4.472156   med  level_0  0
#1  30.0  6.401251   med  level_1  0

data.pivot('g', 'level', ['x', 't', 't_cls'])
#            x                 t             t_cls        
#level level_0 level_1   level_0   level_1 level_0 level_1
#g                                                        
#0        70.0    30.0  4.472156  6.401251     med     med

Upvotes: 1

Related Questions