Stefan
Stefan

Reputation: 1934

Efficiently combine dataframes on 2nd level index

I have two dataframes looking like

import pandas as pd

df1 = pd.DataFrame([2.1,4.2,6.3,8.4,10.5], index=[2,4,6,8,10])
df1.index.name = 't'

df2 = pd.DataFrame(index=pd.MultiIndex.from_tuples([('A','a',1),('A','a',4),
                           ('A','b',5),('A','b',6),('B','c',7),
                           ('B','c',9),('B','d',10),('B','d',11),
                          ], names=('big', 'small', 't')))

I am searching for an efficient way to combine them such that I get

                 0
big small t       
A   a     1    NaN
          2    2.1
          4    4.2
    b     5    NaN
          6    6.3
B   c     7    NaN
          8    8.4
          9    NaN
    d     10  10.5
          11   NaN

I.e. I want to get the index levels 0 and 1 of df2 as index levels 0 and 1 in df1.

Of course a loop over the dataframe would work as well, though not feasible for large dataframes.

EDIT: It appears from comments below that I should add, the indices big and small should be inferred on t in df1 based on the ordering of t.

Upvotes: 0

Views: 63

Answers (2)

ALollz
ALollz

Reputation: 59529

Assuming that you want the unknown index levels to be inferred based on the ordering of 't', we can use an other merge, sort the values and then re-create the MultiIndex using ffill logic (need a Series for this).

res = (df2.reset_index()
          .merge(df1, on='t', how='outer')
          .set_index(df2.index.names)
          .sort_index(level='t'))

res.index = pd.MultiIndex.from_arrays(
                [pd.Series(res.index.get_level_values(i)).ffill()
                 for i in range(res.index.nlevels)],
                names=res.index.names)

print(res)

                 0
big small t       
A   a     1    NaN
          2    2.1
          4    4.2
    b     5    NaN
          6    6.3
B   c     7    NaN
          8    8.4
          9    NaN
    d     10  10.5
          11   NaN

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

Try extracting the level values and reindex:

df2['0'] = df1.reindex(df2.index.get_level_values('t'))[0].values

Output:

                 0
big small t       
A   a     1    NaN
          4    4.2
    b     5    NaN
          6    6.3
B   c     7    NaN
          9    NaN
    d     10  10.5
          11   NaN

For more columns in df1, we can just merge:

(df2.reset_index()
    .merge(df1, on='t', how='left')
    .set_index(df2.index.names)
)

Upvotes: 1

Related Questions