Chris Peng
Chris Peng

Reputation: 946

Split a MultiIndex DataFrame base on another DataFrame

Say you have a multiindex DataFrame

    x y z
a 1 0 1 2
  2 3 4 5
b 1 0 1 2
  2 3 4 5
  3 6 7 8
c 1 0 1 2
  2 0 4 6

Now you have another DataFrame which is

 col1 col2
0 a    1
1 b    1
2 b    3
3 c    1
4 c    2

How do you split the multiindex DataFrame based on the one above?

Upvotes: 2

Views: 423

Answers (3)

BENY
BENY

Reputation: 323276

Simply using isin

df[df.index.isin(list(zip(df2['col1'],df2['col2'])))]


Out[342]: 
               0  1  2  3
index1 index2            
a      1       1  0  1  2
b      1       1  0  1  2
       3       3  6  7  8
c      1       1  0  1  2
       2       2  0  4  6

Upvotes: 1

Simon
Simon

Reputation: 333

You can also do this using the MultiIndex reindex method https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html

## Recreate your dataframes
tuples = [('a', 1), ('a', 2),
              ('b', 1), ('b', 2),
              ('b', 3), ('c', 1),
              ('c', 2)]

data = [[1, 0, 1, 2],
        [2, 3, 4, 5],
        [1, 0, 1, 2],
        [2, 3, 4, 5],
        [3, 6, 7, 8],
        [1, 0, 1, 2],
        [2, 0, 4, 6]]

idx = pd.MultiIndex.from_tuples(tuples, names=['index1','index2'])
df= pd.DataFrame(data=data, index=idx)

df2 = pd.DataFrame([['a', 1],
                   ['b', 1],
                   ['b', 3],
                   ['c', 1],
                   ['c', 2]])

# Answer Question
idx_subset = pd.MultiIndex.from_tuples([(a, b) for a, b in df2.values], names=['index1', 'index2'])
out = df.reindex(idx_subset)
print(out)
                   0  1  2  3
index1 index2            
a      1       1  0  1  2
b      1       1  0  1  2
       3       3  6  7  8
c      1       1  0  1  2
       2       2  0  4  6

Upvotes: 0

jezrael
jezrael

Reputation: 862741

Use loc by tuples:

df = df1.loc[df2.set_index(['col1','col2']).index.tolist()]
print (df)
     x  y  z
a 1  0  1  2
b 1  0  1  2
  3  6  7  8
c 1  0  1  2
  2  0  4  6

df = df1.loc[[tuple(x) for x in df2.values.tolist()]]
print (df)
     x  y  z
a 1  0  1  2
b 1  0  1  2
  3  6  7  8
c 1  0  1  2
  2  0  4  6

Or join:

df = df2.join(df1, on=['col1','col2']).set_index(['col1','col2'])
print (df)
           x  y  z
col1 col2         
a    1     0  1  2
b    1     0  1  2
     3     6  7  8
c    1     0  1  2
     2     0  4  6

Upvotes: 2

Related Questions