t_warsop
t_warsop

Reputation: 1280

Pandas: filter result of column summation

Using a pandas dataframe, for example:

import pandas as pd
df = pd.DataFrame({'a': [1,0,0], 'b': [1,0,0]})

I have used the answer from Pandas: sum DataFrame rows for given columns to sum the two columns:

foo = df[['a', 'b']].sum(axis=1)

What I'm struggling with now is how to filter the rows that are assigned to foo. So, for example, I only want the rows that are greater than 0 to be in the result stored in foo. Does anyone know the best of doing this?

Upvotes: 3

Views: 1645

Answers (2)

Rahul charan
Rahul charan

Reputation: 837

Use Basic You can use basics of Pandas like conditionality AND dropna.

df = pd.DataFrame({'a': [1,0,0], 'b': [1,0,0]})
foo = df[['a', 'b']].sum(axis=1)
foo = pd.DataFrame(foo)  # Converting foo into DataFrame
foo = foo[foo > 0]  # Applying the conditionality search
foo.dropna(axis=0, inplace=True)  # Droping the NaN values
foo.columns = ['Result']   # Changeing the name of column
foo

Output

    Result
0     2.0

I hope it may help you.

Upvotes: 1

jezrael
jezrael

Reputation: 863226

Use:

foo = df[['a', 'b']]

mask = foo.gt(0).all(axis=1)

out = foo[mask].sum(axis=1)
print (out)
0    2
dtype: int64

Details:

Compare by DataFrame.gt (>) for greater values:

print (foo.gt(0))
       a      b
0   True   True
1  False  False
2  False  False

And then test if DataFrame.all values per rows are True, also is possible use DataFrame.any if need test at least one True, it means here one greater value per row:

print (foo.gt(0).all(axis=1))
0     True
1    False
2    False
dtype: bool

But if want filter by foo use boolean indexing and because same index in foo and df create mask by foo and filter original DataFrame:

foo = df[['a', 'b']].sum(axis=1)

df = df[foo.gt(0)]
print (df)
   a  b
0  1  1

Detail:

print (foo.gt(0))
0     True
1    False
2    False
dtype: bool

Upvotes: 1

Related Questions