Reputation: 2889
I want two join two dataframes, with different indices, and return a new dataframe containing the indices from the two frames where there is a match.
df_a
contains the resulting value from a number of simulations. df_b
, for each Group and three different "Levels", contains the resulting simulation value we want to retrieve. The final output should be a dataframe with each group, level and the matching simulation. For instance for 'Group_1', 'Level_1' simulation, 'Sim_5', is the simulation we want.
df_a = pd.DataFrame({'Group_1': [5, 2, 3, 4, 1],
'Group_2': [1, 4, 3, 2, 5],
'Group_3': [2, 1, 5, 4, 3]},
index=['Sim_1', 'Sim_2', 'Sim_3', 'Sim_4', 'Sim_5'])
df_b = pd.DataFrame({'Group_1': [1, 2, 4],
'Group_2': [2, 5, 1],
'Group_3': [3, 2, 4]},
index=['Level_1', 'Level_2', 'Level_3'])
# Expected output
df_c = pd.DataFrame(data=['Sim_5', 'Sim_2', 'Sim_4', 'Sim_4', 'Sim_5', 'Sim_1', 'Sim_5', 'Sim_1', 'Sim_4'],
index=pd.MultiIndex.from_product([['Group_1', 'Group_2', 'Group_3'], ['Level_1', 'Level_2', 'Level_3']]))
Out[54]:
0
Group_1 Level_1 Sim_5
Level_2 Sim_2
Level_3 Sim_4
Group_2 Level_1 Sim_4
Level_2 Sim_5
Level_3 Sim_1
Group_3 Level_1 Sim_5
Level_2 Sim_1
Level_3 Sim_4
Can anyone think of a nice way to achieve this? I am probably able to 'brute-force' something, but I would prefer a nice clean implementation.
Upvotes: 1
Views: 40
Reputation: 22503
Probably not the cleanest, but just one way of doing it:
a = (df_a.reset_index().melt(id_vars="index").set_index(["variable","value"]))
b = (df_b.reset_index().melt(id_vars="index").set_index(["variable","value"]))
print (b.join(a, lsuffix="_a", rsuffix="_b").reset_index(level=1, drop=True)
.set_index('index_b', append=True).sort_index())
index_a index_b
variable value
Group_1 1 Level_1 Sim_5
2 Level_2 Sim_2
4 Level_3 Sim_4
Group_2 1 Level_3 Sim_1
2 Level_1 Sim_4
5 Level_2 Sim_5
Group_3 2 Level_2 Sim_1
3 Level_1 Sim_5
4 Level_3 Sim_4
Upvotes: 1
Reputation: 862761
Use inner DataFrame.merge
with unpivoted values by DataFrame.melt
, convert to MultiIndex
by DataFrame.set_index
, remove index names by DataFrame.rename_axis
, convert to one column DataFrame
by Series.to_frame
and sort MultiIndex
:
df = (df_a.reset_index().melt('index')
.merge(df_b.reset_index().melt('index')
, on=['variable','value'])
.set_index(['variable','index_y'])['index_x']
.rename_axis((None, None))
.to_frame(0)
.sort_index())
print (df)
0
Group_1 Level_1 Sim_5
Level_2 Sim_2
Level_3 Sim_4
Group_2 Level_1 Sim_4
Level_2 Sim_5
Level_3 Sim_1
Group_3 Level_1 Sim_5
Level_2 Sim_1
Level_3 Sim_4
Upvotes: 1