Reputation: 17894
Here is the sample data file, and I performed the following operation in ipython notebook:
!curl -O http://pbpython.com/extras/sales-funnel.xlsx
df = pd.read_excel('./sales-funnel.xlsx')
df['Status'] = df['Status'].astype('category')
df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace=True)
table = pd.pivot_table(df,
index=['Manager', 'Status'],
values=['Price', 'Quantity'],
columns=['Product'],
aggfunc={'Price':[np.sum, np.mean], 'Quantity':len},
fill_value=0
)
This is what the data looks like in table
:
I want to select (Manager=="Debra Henley") & (Status=="won")
and it works with the query
method:
table.query('(Manager=="Debra Henley") & (Status=="won")')
But how do you perform the same selection with loc
? I tried this but does not work:
table.loc[['Debra Henley', 'won']]
What do you guys usually use when dealing with MultiIndex? What's the best way to do it?
Update: found two solutions so far:
table.xs(('Debra Henley','won'), level=('Manager', 'Status'))
table.loc[[('Debra Henley', 'won')]]
So I guess tuples
should be used instead of lists
when indexing with MultiIndex?
Upvotes: 6
Views: 5898
Reputation: 862481
For simplier selections (only index or only columns) use xs
approach or selecting by tuples
.
Another more general solution with slicers:
idx = pd.IndexSlice
#output is df
print (table.loc[[idx['Debra Henley','won']]])
Quantity Price \
len mean
Product CPU Maintenance Monitor Software CPU Maintenance
Manager Status
Debra Henley won 1 0 0 0 65000 0
sum
Product Monitor Software CPU Maintenance Monitor Software
Manager Status
Debra Henley won 0 0 65000 0 0 0
idx = pd.IndexSlice
#output is series
print (table.loc[idx['Debra Henley','won'],:])
Quantity len CPU 1
Maintenance 0
Monitor 0
Software 0
Price mean CPU 65000
Maintenance 0
Monitor 0
Software 0
sum CPU 65000
Maintenance 0
Monitor 0
Software 0
Name: (Debra Henley, won), dtype: int64
But it is better for more complicated selections - if need filter index and columns together - one xs
doesnt work:
idx = pd.IndexSlice
#select all rows where first level is Debra Henley in index and
#in columns second level is len and sum
print (table.loc[idx['Debra Henley',:], idx[:, ['len', 'sum'], :]])
Quantity Price \
len sum
Product CPU Maintenance Monitor Software CPU
Manager Status
Debra Henley won 1 0 0 0 65000
pending 1 2 0 0 40000
presented 1 0 0 2 30000
declined 2 0 0 0 70000
Product Maintenance Monitor Software
Manager Status
Debra Henley won 0 0 0
pending 10000 0 0
presented 0 0 20000
declined 0 0 0
Upvotes: 3
Reputation: 294218
Your canonical answer is provided by @ScottBoston.
I'll add this for breadth and perspective in addition to @jezrael's IndexSlice
approach.
You can also use pd.DataFrame.xs
to take a cross-section
table.xs(['Debra Henley', 'won'])
Product
Quantity len CPU 1
Maintenance 0
Monitor 0
Software 0
Price mean CPU 65000
Maintenance 0
Monitor 0
Software 0
sum CPU 65000
Maintenance 0
Monitor 0
Software 0
Name: (Debra Henley, won), dtype: int64
Upvotes: 6
Reputation: 153460
Yes, you can use:
table.loc[[('Debra Henley', 'won')]]
to return a pandas data frame or you can use:
table.loc[('Debra Henley','won')]
to return a pandas series.
You can refer to the this documentation.
Upvotes: 1