fhgd
fhgd

Reputation: 435

Flatten a one-to-one mapping in a multiindex pandas dataframe

I have the following data structure:

from collections import OrderedDict
import pandas as pd

d = OrderedDict([
    ((5, 3, 1), {'y1': 1}),
    ((5, 3, 2), {'y2': 2}),
    ((5, 4, 1), {'y1': 10}),
    ((5, 4, 2), {'y2': 20}),

    ((6, 3, 1), {'y1': 100}),
    ((6, 3, 2), {'y2': 200}),
    ((6, 4, 1), {'y1': 1000}),
    ((6, 4, 2), {'y2': 2000}),
])

df = pd.DataFrame(
    d.values(),
    index=pd.MultiIndex.from_tuples(d.keys(), names=['x3', 'x2', 'x1']),
)

The table looks like

            y1    y2
x3 x2 x1            
5  3  1      1   NaN
      2    NaN     2
   4  1     10   NaN
      2    NaN    20
6  3  1    100   NaN
      2    NaN   200
   4  1   1000   NaN
      2    NaN  2000

As you can see there is a one-to-one mapping between x1 and the columns (x1=1: y1, x1=2: y2) which I want to flatten into

         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

How can I do it?

Edit: Or the other way round:

             y
x3 x2 x1            
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

Upvotes: 2

Views: 228

Answers (2)

jezrael
jezrael

Reputation: 862591

You can use stack for remove NaN, because create Series, remove third level by reset_index and last reshape by unstack:

print (df.stack().reset_index(level=2,drop=True).unstack(2))
           y1      y2
x3 x2                
5  3      1.0     2.0
   4     10.0    20.0
6  3    100.0   200.0
   4   1000.0  2000.0

If need cast to int add astype:

print (df.stack().reset_index(level=2,drop=True).unstack(2).astype(int))
         y1    y2
x3 x2            
5  3      1     2
   4     10    20
6  3    100   200
   4   1000  2000

EDIT:

print (df.stack().reset_index(level=3,drop=True).to_frame('y').astype(int))
             y
x3 x2 x1      
5  3  1      1
      2      2
   4  1     10
      2     20
6  3  1    100
      2    200
   4  1   1000
      2   2000

Upvotes: 2

Romain Jouin
Romain Jouin

Reputation: 4838

df2 = df.unstack()
df2.columns = range(4)
df3 = df2.drop([1,2], axis=1)
df3.columns = ["Y1", "Y2"]
df3

gives

enter image description here

Upvotes: 0

Related Questions