Reputation: 3619
I generate a multiindex dataframe like this example
import pandas as pd
import numpy as np
iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])
df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)
The columns each have a sublevel attribute called 'weight'
I need to generate a list or (preferably) Series that contains, for a given row, all the 'weight' sub-columns in that row. In the example picture, I'd want a Series that gave me 0.02, 0.46, 0.33, 0.47.
Can anyone suggest a nice way to do this? The solutions I've thought of are all gross, and I suspect I have an incomplete understanding of the indexing capabilities of pandas.
Upvotes: 3
Views: 2481
Reputation: 394099
IIUC then you can use loc
and pass a tuple consisting of a slice
and column label to access the col of interest at that level:
In [59]:
iterables = [ ['co1', 'co2', 'co3', 'co4'], ['age','weight'] ]
multi = pd.MultiIndex.from_product(iterables, names= ["Spread", "attribute"])
df = pd.DataFrame(np.random.rand(80).reshape(10,8),index = range(0,10), columns = multi)
df
Out[59]:
Spread co1 co2 co3 \
attribute age weight age weight age weight
0 0.600947 0.509537 0.605538 0.496002 0.215206 0.075079
1 0.152956 0.922832 0.167788 0.024761 0.622378 0.983030
2 0.712478 0.603798 0.407014 0.625474 0.445592 0.903240
3 0.420569 0.576604 0.220097 0.401624 0.929464 0.512026
4 0.273088 0.032303 0.607577 0.836231 0.751845 0.181522
5 0.859699 0.274760 0.456812 0.666109 0.349961 0.237894
6 0.632754 0.603252 0.157416 0.221576 0.068355 0.121864
7 0.090595 0.035526 0.698262 0.525770 0.792618 0.220601
8 0.670236 0.805195 0.310680 0.100464 0.875299 0.853238
9 0.020501 0.405245 0.447614 0.999340 0.659616 0.709312
Spread co4
attribute age weight
0 0.297421 0.415730
1 0.235259 0.156014
2 0.365762 0.198299
3 0.695431 0.478457
4 0.331657 0.338436
5 0.943810 0.097999
6 0.638720 0.033747
7 0.646969 0.475316
8 0.623225 0.024976
9 0.023494 0.959514
In [61]:
df.loc[1,(slice(None),'weight')]
Out[61]:
Spread attribute
co1 weight 0.922832
co2 weight 0.024761
co3 weight 0.983030
co4 weight 0.156014
Name: 1, dtype: float64
To explain the syntax:
df.loc[1,(slice(None),'weight')]
So the first param is just your index lave, the second param is a tuple consisting of a slice and a col label, the first member being slice(None)
selects all cols 'col1' to 'col4' in effect, then the second param selects at the next level cols that match the label 'weight'
Upvotes: 6