Reputation: 125
I have the following 2 dataframes df1 and df2:
import pandas as pd
m_idx = pd.MultiIndex.from_product([range(3), range(3, 6), range(6, 8)])
m_idx.names = ['a', 'b', 'c']
df1 = pd.DataFrame(None, index=m_idx, columns=['x1', 'x2', 'x3'])
df1.loc[:, 'x1'] = m_idx.get_level_values('a') + m_idx.get_level_values('b') + m_idx.get_level_values('c')
df1.loc[:, 'x2'] = df1.loc[:, 'x1'] * 2
df1.loc[:, 'x3'] = df1.loc[:, 'x1'] * 3
df2 = pd.DataFrame({'a': [0, 2], 'c': [6, 6]})
df1:
x1 x2 x3
a b c
0 3 6 9 18 27
7 10 20 30
4 6 10 20 30
7 11 22 33
5 6 11 22 33
7 12 24 36
1 3 6 10 20 30
7 11 22 33
4 6 11 22 33
7 12 24 36
5 6 12 24 36
7 13 26 39
2 3 6 11 22 33
7 12 24 36
4 6 12 24 36
7 13 26 39
5 6 13 26 39
7 14 28 42
df2:
a c
0 0 6
1 2 6
How can I convert df2 into something I can use to look up the index of df1 where the column names of df2 are the levels and in each row you have the combination of keys you are looking to get out of the df1 index.
Or in other words how can I convert df2 into something that does the equivalent of
df1.loc[pd.IndexSlice[[0, 2], :, [6, 6]], :]
which would return:
x1 x2 x3
a b c
0 3 6 9 18 27
4 6 10 20 30
5 6 11 22 33
2 3 6 11 22 33
4 6 12 24 36
5 6 13 26 39
This is a very simplified and small scale version of what I am actually trying to solve. So really looking to create the pd.IndexSlice on the fly.
I see a separate question that suggested this and I have done something similar in my code BUT it takes too long for my purposes.
df_list = [df1.loc[(v[0], slice(None), v[1]), :] for r, v in df2.iterrows()]
df_sliced = pd.concat(df_list)
So am hoping that using pd.IndexSlice or another alternative instead could be much quicker.
MANY THANKS!
Upvotes: 1
Views: 119
Reputation: 71689
Convert the index of df1
to dataframe then use isin
+ all
on the matching levels to replicate the behaviour of index slice
d = df2.to_dict('list')
df1[df1.index.to_frame()[d].isin(d).all(1)]
x1 x2 x3
a b c
0 3 6 9 18 27
4 6 10 20 30
5 6 11 22 33
2 3 6 11 22 33
4 6 12 24 36
5 6 13 26 39
Upvotes: 1