kjo
kjo

Reputation: 35311

Selecting by subset of multiindex level

Toy example

Suppose I have the dataframe df shown below

           C
L0 L1 L2    
0  w  P   11
      Q    9
      R   21
      S    4
   x  P    3
      Q    0
      R   23
      S   20
   y  P   19
      Q    0
      R    7
      S   13
   z  P   17
      Q    0
      R    5
      S    1
1  w  P    8
      Q    2
      R   12
      S    0
   x  P   22
      Q   14
      R    2
      S   18
   y  P    6
      Q    0
      R   16
      S   15
   z  P   10
      Q    0
      R    8
      S    0

Note that the rows of df are indexed by 3-level multi-index.

I can find the minimum of the C column for each value of the L2 level, as follows:

In [58]: df.groupby(level='L2').min()
Out[58]: 
    C
L2   
P   3
Q   0
R   2
S   0

Similarly, the following expression shows the values of L2 for which this minimum is greater than 0:

In [59]: df.groupby(level='L2').min() > 0
Out[59]: 
        C
L2       
P    True
Q   False
R    True
S   False

Question: How can I select the rows of the original dataframe df corresponding to the values of L2 for which the minimum of C is greater than 0?


In this simple example the problem boils down to selecting the rows of df whose L2 value is either 'P' or 'R'. Therefore, it would not be difficult to brute-force the problem by concatenating the rows for L2='P' with those for L2='R'.

In the application I have in mind, however, such a solution becomes unwieldy, since here the L2 level has ~2000 values, and for about half of them, the minimum is greater than 0.

Therefore, I'm looking for a more scalable way to solve this selection problem.

DATA

L0  L1  L2  C
0   w   P   11
0   w   Q   9
0   w   R   21
0   w   S   4
0   x   P   3
0   x   Q   0
0   x   R   23
0   x   S   20
0   y   P   19
0   y   Q   0
0   y   R   7
0   y   S   13
0   z   P   17
0   z   Q   0
0   z   R   5
0   z   S   1
1   w   P   8
1   w   Q   2
1   w   R   12
1   w   S   0
1   x   P   22
1   x   Q   14
1   x   R   2
1   x   S   18
1   y   P   6
1   y   Q   0
1   y   R   16
1   y   S   15
1   z   P   10
1   z   Q   0
1   z   R   8
1   z   S   0

Upvotes: 1

Views: 800

Answers (1)

Zero
Zero

Reputation: 76917

Here's one way

Store the L2 values needed

In [413]: m = df.groupby(level='L2').min() > 0

In [414]: m
Out[414]:
        C
L2
P    True
Q   False
R    True
S   False

Using, get_level_values(index_level_name).isin(to_be_filtered) filter only needed values.

In [415]: df[df.index.get_level_values('L2').isin(m.loc[m.C, :].index)]
Out[415]:
           C
L0 L1 L2
0  w  P   11
      R   21
   x  P    3
      R   23
   y  P   19
      R    7
   z  P   17
      R    5
1  w  P    8
      R   12
   x  P   22
      R    2
   y  P    6
      R   16
   z  P   10
      R    8

Here's another way. Using, transform and subset selection

In [430]: df[(df.groupby(level='L2').transform('min') > 0).C]
Out[430]:
           C
L0 L1 L2
0  w  P   11
      R   21
   x  P    3
      R   23
   y  P   19
      R    7
   z  P   17
      R    5
1  w  P    8
      R   12
   x  P   22
      R    2
   y  P    6
      R   16
   z  P   10
      R    8

Details

In [416]: m.loc[m.C, :].index
Out[416]: Index([u'P', u'R'], dtype='object', name=u'L2')

Upvotes: 2

Related Questions