Reputation: 3619
I have multiindex dataframe, something like:
df = pd.DataFrame(index = pd.MultiIndex.from_product([['mike', 'matt', 'dave', 'frank', 'larry'], range(10)]))
df['foo']="bar"
df.index.names=['people', 'socket']
What I'd like to do is iloc-slice all the rows associated with the first three people in the index. IE: retrieve all the rows where people
is either matt
mike
or dave
.
As far as I can tell, though, this is not at all supported by pandas. Saw some gross levels-related hacks, but they didn't even work. get_level_values(0)
doesn't give distinct level values, and levels()
returns an unsorted frozenset
.
edit: I should have said that .loc
-based solutions won't work for me.
Upvotes: 1
Views: 1096
Reputation: 11409
You can also use df.xs()
"This method takes a key argument to select data at a particular level of a MultiIndex."
Reusing your example:
import pandas as pd
df = pd.DataFrame(index = pd.MultiIndex.from_product([['mike', 'matt', 'dave', 'frank', 'larry'], range(10)], names=['people', 'socket']))
df['foo']="bar"
df.index.names=['people', 'socket']
In [60]: df.xs("mike", level="people")
Out[60]:
foo
socket
0 bar
1 bar
2 bar
3 bar
4 bar
5 bar
6 bar
7 bar
8 bar
9 bar
In [61]: df.xs(7, level="socket")
Out[61]:
foo
people
mike bar
matt bar
dave bar
frank bar
larry bar
Upvotes: 0
Reputation: 150785
Another option:
df[df.index.get_level_values(0)
.isin({'matt','mike','dave'})]
Upvotes: 0
Reputation: 5740
Here you go:
df = pd.DataFrame(index = pd.MultiIndex.from_product([['mike', 'matt', 'dave', 'frank', 'larry'], range(10)], names=['people', 'socket']))
df['foo']="bar"
df.index.names=['people', 'socket']
# get rows
select_rows = df.loc[['mike', 'matt', 'dave']]
Output:
people socket
mike 0 bar
1 bar
2 bar
3 bar
4 bar
5 bar
6 bar
7 bar
8 bar
9 bar
matt 0 bar
1 bar
2 bar
3 bar
4 bar
5 bar
6 bar
7 bar
8 bar
9 bar
dave 0 bar
1 bar
2 bar
3 bar
4 bar
5 bar
6 bar
7 bar
8 bar
9 bar
Upvotes: 2
Reputation: 863216
One idea is get first uniqe values of first level, indexing and select by loc
:
df = df.loc[df.index.get_level_values(0).unique()[:3]]
Detail:
print (df.index.get_level_values(0).unique()[:3])
Index(['mike', 'matt', 'dave'], dtype='object', name='people')
Upvotes: 0