Pandas: filter result of column summation

Question

Using a pandas dataframe, for example:

import pandas as pd
df = pd.DataFrame({'a': [1,0,0], 'b': [1,0,0]})

I have used the answer from Pandas: sum DataFrame rows for given columns to sum the two columns:

foo = df[['a', 'b']].sum(axis=1)

What I'm struggling with now is how to filter the rows that are assigned to foo. So, for example, I only want the rows that are greater than 0 to be in the result stored in foo. Does anyone know the best of doing this?

jezrael · Accepted Answer

Use:

foo = df[['a', 'b']]

mask = foo.gt(0).all(axis=1)

out = foo[mask].sum(axis=1)
print (out)
0    2
dtype: int64

Details:

Compare by DataFrame.gt (>) for greater values:

print (foo.gt(0))
       a      b
0   True   True
1  False  False
2  False  False

And then test if DataFrame.all values per rows are True, also is possible use DataFrame.any if need test at least one True, it means here one greater value per row:

print (foo.gt(0).all(axis=1))
0     True
1    False
2    False
dtype: bool

But if want filter by foo use boolean indexing and because same index in foo and df create mask by foo and filter original DataFrame:

foo = df[['a', 'b']].sum(axis=1)

df = df[foo.gt(0)]
print (df)
   a  b
0  1  1

Detail:

print (foo.gt(0))
0     True
1    False
2    False
dtype: bool

Pandas: filter result of column summation

Answers (2)

Related Questions