I have a dataframe created like this: import pandas as pd df = pd.DataFrame({ 'ind1': list('AAABBBCCC'), 'ind2': list(map(int, list('123123123'))), 'val1': [0, 0, 0, -1, -4, 5, 10, 11, 4], 'val2': [0.1, 0.2, -0.2, 0.1, 0.2, 0.2, -0.1, 2, 0.1] }) df = df.set_index(['ind1', 'ind2']) Resulting data: val1 val2 ind1 ind2 A 1 0 0.1 2 0 0.2 3 0 -0.2 B 1 -1 0.1 2 -4 0.2 3 5 0.2 C 1 10 -0.1 2 11 2.0 3 4 0.1 I want to select all entries where: At least one item in val1 is unequal 0 Each absolute value in val2 is < 0.5 In the example above, therefore only B 1 -1 0.1 2 -4 0.2 3 5 0.2 should remain. I cannot use sum() as the values can be positive and negative, so something like this df.reset_index().groupby('ind1').sum() ind2 val1 val2 ind1 A 6 0 0.1 B 6 0 0.5 C 6 25 2.0 would not work. How would I use any() and all() here?

Reputation: 25997

How to select from multiindex based on individual values in several columns?

I have a dataframe created like this:

import pandas as pd

df = pd.DataFrame({
    'ind1': list('AAABBBCCC'),
    'ind2': list(map(int, list('123123123'))),
    'val1': [0, 0, 0, -1, -4, 5, 10, 11, 4],
    'val2': [0.1, 0.2, -0.2, 0.1, 0.2, 0.2, -0.1, 2, 0.1]
})

df = df.set_index(['ind1', 'ind2'])

Resulting data:

           val1  val2
ind1 ind2            
A    1        0   0.1
     2        0   0.2
     3        0  -0.2
B    1       -1   0.1
     2       -4   0.2
     3        5   0.2
C    1       10  -0.1
     2       11   2.0
     3        4   0.1

I want to select all entries where:

At least one item in val1 is unequal 0
Each absolute value in val2 is < 0.5

In the example above, therefore only

B    1       -1   0.1
     2       -4   0.2
     3        5   0.2

should remain.

I cannot use sum() as the values can be positive and negative, so something like this

df.reset_index().groupby('ind1').sum()

      ind2  val1  val2
ind1                  
A        6     0   0.1
B        6     0   0.5
C        6    25   2.0

would not work.

How would I use any() and all() here?

Upvotes: 1

Answers (3)

Quang Hoang

Reputation: 150735

This works for me, similar to Wen's solution:

mask = df.abs().groupby(level=0).transform('max')
df[mask.val1.gt(0)&mask.val2.lt(0.5)]

Output:

           val1  val2
ind1 ind2            
B    1       -1   0.1
     2       -4   0.2
     3        5   0.2

Upvotes: 1

BENY

Reputation: 323226

Without lambda by transform

s1=df.val1.ne(0).groupby(level=0).transform('any')
s2=df.val2.abs().lt(0.5).groupby(level=0).transform('all')
df[s1&s2]
Out[583]: 
           val1  val2
ind1 ind2            
B    1       -1   0.1
     2       -4   0.2
     3        5   0.2

Upvotes: 2

Peter Leimbigler

Reputation: 11105

One way is via groupby().filter() (link to docs), which evaluates a boolean condition on each group (as opposed to each DataFrame row):

df.groupby('ind1').filter(lambda x: x['val1'].any() & 
                                   (x['val2'].abs() < 0.5).all())

           val1  val2
ind1 ind2            
B    1       -1   0.1
     2       -4   0.2
     3        5   0.2

Note that DataFrame.groupby.filter() is unrelated to the same-named method DataFrame.filter()!

Upvotes: 1

How to select from multiindex based on individual values in several columns?

Answers (3)

Related Questions