Reputation: 3233
Consider the following dataframe
import pandas as pd
import numpy as np
arr = np.random.random((2, 4))
mdf = pd.DataFrame({'cid': ['c1', 'c2']})
pdf = pd.DataFrame({'doc_id': ['d1', 'd1', 'd2', 'd2'], 'passage_id': ['p1', 'p2', 'p1', 'p2']})
index = pd.MultiIndex.from_frame(mdf.join(pdf, how='cross'))
df = pd.DataFrame({'score': arr.flatten()}, index=index)
df is
score
cid doc_id passage_id
c1 d1 p1 0.708722
p2 0.975350
d2 p1 0.326029
p2 0.979832
c2 d1 p1 0.147153
p2 0.381807
d2 p1 0.525054
p2 0.245478
Now If i try to index using a list of tuples using only two levels
df.loc[[('c1', 'd1'), ('c2', 'd2')]]
I get the following error:
ValueError: operands could not be broadcast together with shapes (2,2) (3,) (2,2)
Why is this error happening ?
I expected the answer to be:
score
cid doc_id passage_id
c1 d1 p1 0.708722
p2 0.975350
c2 d2 p1 0.525054
p2 0.245478
Upvotes: 1
Views: 725
Reputation: 893
To add to the solutions that have been provided:
df.reset_index(level=2).loc[[('c1', 'd1'), ('c2', 'd2')]].set_index('passage_id', append=True)
I wish I could think of a more elegant solution. Here's the breakdown of what this is doing:
.reset_index(level=2)
moves the third index from the left into the "body" of the DataFrame (as a regular column)..loc[[('c1', 'd1'), ('c2', 'd2')]]
gets the rows that you wanted..set_index('passage_id', append=True)
moves the passage_id
column back into the index.Upvotes: 0
Reputation: 120409
You can use get_locs
:
loc = df.index.get_locs
idx = np.union1d(loc(('c1', 'd1')), loc(('c2', 'd2')))
subdf = df.iloc[idx]
Output:
>>> subdf
score
cid doc_id passage_id
c1 d1 p1 0.055452
p2 0.758224
c2 d2 p1 0.773690
p2 0.519005
>>> idx
array([0, 1, 6, 7])
Upvotes: 1
Reputation: 323226
A little bit over thinking since we need the multiple index dataframe
inputtuple =pd.DataFrame([('c1', 'd1'), ('c2', 'd2')],columns = ['cid','doc_id'])
out = df.reset_index().merge(inputtuple).set_index(df.index.names)
Out[199]:
score
cid doc_id passage_id
c1 d1 p1 0.428390
p2 0.931326
c2 d2 p1 0.160805
p2 0.476747
Upvotes: 1