Reputation: 35311
Toy example
Suppose I have the dataframe df
shown below
C
L0 L1 L2
0 w P 11
Q 9
R 21
S 4
x P 3
Q 0
R 23
S 20
y P 19
Q 0
R 7
S 13
z P 17
Q 0
R 5
S 1
1 w P 8
Q 2
R 12
S 0
x P 22
Q 14
R 2
S 18
y P 6
Q 0
R 16
S 15
z P 10
Q 0
R 8
S 0
Note that the rows of df
are indexed by 3-level multi-index.
I can find the minimum of the C
column for each value of the L2
level, as follows:
In [58]: df.groupby(level='L2').min()
Out[58]:
C
L2
P 3
Q 0
R 2
S 0
Similarly, the following expression shows the values of L2
for which this minimum is greater than 0:
In [59]: df.groupby(level='L2').min() > 0
Out[59]:
C
L2
P True
Q False
R True
S False
Question: How can I select the rows of the original dataframe df
corresponding to the values of L2
for which the minimum of C
is greater than 0?
In this simple example the problem boils down to selecting the rows of df
whose L2
value is either 'P'
or 'R'
. Therefore, it would not be difficult to brute-force the problem by concatenating the rows for L2='P'
with those for L2='R'
.
In the application I have in mind, however, such a solution becomes unwieldy, since here the L2
level has ~2000 values, and for about half of them, the minimum is greater than 0.
Therefore, I'm looking for a more scalable way to solve this selection problem.
DATA
L0 L1 L2 C
0 w P 11
0 w Q 9
0 w R 21
0 w S 4
0 x P 3
0 x Q 0
0 x R 23
0 x S 20
0 y P 19
0 y Q 0
0 y R 7
0 y S 13
0 z P 17
0 z Q 0
0 z R 5
0 z S 1
1 w P 8
1 w Q 2
1 w R 12
1 w S 0
1 x P 22
1 x Q 14
1 x R 2
1 x S 18
1 y P 6
1 y Q 0
1 y R 16
1 y S 15
1 z P 10
1 z Q 0
1 z R 8
1 z S 0
Upvotes: 1
Views: 800
Reputation: 76917
Here's one way
Store the L2
values needed
In [413]: m = df.groupby(level='L2').min() > 0
In [414]: m
Out[414]:
C
L2
P True
Q False
R True
S False
Using, get_level_values(index_level_name).isin(to_be_filtered)
filter only needed values.
In [415]: df[df.index.get_level_values('L2').isin(m.loc[m.C, :].index)]
Out[415]:
C
L0 L1 L2
0 w P 11
R 21
x P 3
R 23
y P 19
R 7
z P 17
R 5
1 w P 8
R 12
x P 22
R 2
y P 6
R 16
z P 10
R 8
Here's another way. Using, transform and subset selection
In [430]: df[(df.groupby(level='L2').transform('min') > 0).C]
Out[430]:
C
L0 L1 L2
0 w P 11
R 21
x P 3
R 23
y P 19
R 7
z P 17
R 5
1 w P 8
R 12
x P 22
R 2
y P 6
R 16
z P 10
R 8
Details
In [416]: m.loc[m.C, :].index
Out[416]: Index([u'P', u'R'], dtype='object', name=u'L2')
Upvotes: 2