mortysporty
mortysporty

Reputation: 2889

Join two dataframes and get new frame with indices from dataframes

I want two join two dataframes, with different indices, and return a new dataframe containing the indices from the two frames where there is a match.

df_a contains the resulting value from a number of simulations. df_b, for each Group and three different "Levels", contains the resulting simulation value we want to retrieve. The final output should be a dataframe with each group, level and the matching simulation. For instance for 'Group_1', 'Level_1' simulation, 'Sim_5', is the simulation we want.

df_a = pd.DataFrame({'Group_1': [5, 2, 3, 4, 1],
                     'Group_2': [1, 4, 3, 2, 5],
                     'Group_3': [2, 1, 5, 4, 3]}, 
                    index=['Sim_1', 'Sim_2', 'Sim_3', 'Sim_4', 'Sim_5'])
df_b = pd.DataFrame({'Group_1': [1, 2, 4],
                     'Group_2': [2, 5, 1],
                     'Group_3': [3, 2, 4]}, 
                    index=['Level_1', 'Level_2', 'Level_3'])

# Expected output
df_c = pd.DataFrame(data=['Sim_5', 'Sim_2', 'Sim_4', 'Sim_4', 'Sim_5', 'Sim_1', 'Sim_5', 'Sim_1', 'Sim_4'],
                    index=pd.MultiIndex.from_product([['Group_1', 'Group_2', 'Group_3'], ['Level_1', 'Level_2', 'Level_3']]))

Out[54]: 
                     0
Group_1 Level_1  Sim_5
        Level_2  Sim_2
        Level_3  Sim_4
Group_2 Level_1  Sim_4
        Level_2  Sim_5
        Level_3  Sim_1
Group_3 Level_1  Sim_5
        Level_2  Sim_1
        Level_3  Sim_4

Can anyone think of a nice way to achieve this? I am probably able to 'brute-force' something, but I would prefer a nice clean implementation.

Upvotes: 1

Views: 40

Answers (2)

Henry Yik
Henry Yik

Reputation: 22503

Probably not the cleanest, but just one way of doing it:

a = (df_a.reset_index().melt(id_vars="index").set_index(["variable","value"]))
b = (df_b.reset_index().melt(id_vars="index").set_index(["variable","value"]))

print (b.join(a, lsuffix="_a", rsuffix="_b").reset_index(level=1, drop=True)
        .set_index('index_b', append=True).sort_index())

                index_a index_b
variable value                 
Group_1  1      Level_1   Sim_5
         2      Level_2   Sim_2
         4      Level_3   Sim_4
Group_2  1      Level_3   Sim_1
         2      Level_1   Sim_4
         5      Level_2   Sim_5
Group_3  2      Level_2   Sim_1
         3      Level_1   Sim_5
         4      Level_3   Sim_4

Upvotes: 1

jezrael
jezrael

Reputation: 862761

Use inner DataFrame.merge with unpivoted values by DataFrame.melt, convert to MultiIndex by DataFrame.set_index , remove index names by DataFrame.rename_axis, convert to one column DataFrame by Series.to_frame and sort MultiIndex:

df = (df_a.reset_index().melt('index')
           .merge(df_b.reset_index().melt('index')
                  , on=['variable','value'])
           .set_index(['variable','index_y'])['index_x']
           .rename_axis((None, None))
           .to_frame(0)
           .sort_index())
print (df)
                     0
Group_1 Level_1  Sim_5
        Level_2  Sim_2
        Level_3  Sim_4
Group_2 Level_1  Sim_4
        Level_2  Sim_5
        Level_3  Sim_1
Group_3 Level_1  Sim_5
        Level_2  Sim_1
        Level_3  Sim_4

Upvotes: 1

Related Questions