Reputation: 469
I want to be able to drop rows from a multi-indexed dataframe object using multiple level criteria (with a logical AND joining the criteria).
Consider the pandas dataframe object given by:
import pandas as pd
df = pd.DataFrame(data = [[1,'x'],[2,'x'],[1,'y'],[2,'y']],
index=pd.MultiIndex(levels=[['A','B'],['a','b']],
labels=[[0,1,0,1],[0,1,1,0]],
names=['idx0','idx1']))
print(df)
outputs:
0 1
idx0 idx1
A a 1 x
B b 2 x
A b 1 y
B a 2 y
I wish to eliminate the row where 'idx0'=='A'
and 'idx1'=='a'
, so the end result is:
0 1
idx0 idx1
B b 2 x
a 2 y
A b 1 y
It seems to me as if this cannot be done with the df.drop()
method. A 'roundabout' way which gives the correct result is to do:
df = pd.concat([df.drop(labels='A',level=0),df.drop(labels='a',level=1)])
df = df.drop_duplicates()
But I figure that there has to be a better way...
Upvotes: 5
Views: 3817
Reputation: 42875
To address your question regarding .drop()
- just pass the MultiIndex
labels as tuple
:
df.drop(('A', 'a'))
0 1
idx0 idx1
B b 2 x
A b 1 y
B a 2 y
Upvotes: 5
Reputation: 31672
You could use isin
method for index and take opposite to what are you selecting with ~
:
In [85]: df.index.isin([('A','a')])
Out[85]: array([ True, False, False, False], dtype=bool)
In [86]: df[~df.index.isin([('A','a')])]
Out[86]:
0 1
idx0 idx1
B b 2 x
A b 1 y
B a 2 y
Timing:
In [95]: %timeit df.drop(('A','a'))
1000 loops, best of 3: 1.33 ms per loop
In [96]: %timeit df[~df.index.isin([('A','a')])]
1000 loops, best of 3: 457 us per loop
So drop is almost 3x times slower then with isin
solution.
Upvotes: 2