Isidora
Isidora

Reputation: 39

Select rows where two or more columns are bigger than 0 in pandas

I am working with a dataframe in pandas. My dataframe had 55 columns and 70.000 rows.

How can I select the rows where two or more values are bigger than 0?

It now looks like this:

   A   B  C   D   E
a  0   2  0   8   0
b  3   0  0   0   0
c  6   2  5   0   0

And would like to make this:

   A   B  C   D   E  F
a  0   2  0   8   0  true
b  3   0  0   0   0  false
c  6   2  5   0   0  true

Have tried converting it to just 0's and 1's and summing that, like so:

df[df > 0] = 1

df[(df > 0).sum(axis=1) >= 2]

But then I lose all the other info in the dataframe and I still want to be able to see the original values.

Upvotes: 1

Views: 1611

Answers (2)

U13-Forward
U13-Forward

Reputation: 71570

Try assigning to a column like this:

>>> df['F'] = df.gt(0).sum(axis=1).ge(2)
>>> df
   A  B  C  D  E      F
a  0  2  0  8  0   True
b  3  0  0  0  0  False
c  6  2  5  0  0   True

Or try with astype(bool):

>>> df['F'] = df.astype(bool).sum(axis=1).ge(2)
>>> df
   A  B  C  D  E      F
a  0  2  0  8  0   True
b  3  0  0  0  0  False
c  6  2  5  0  0   True
>>> 

Upvotes: 2

jezrael
jezrael

Reputation: 862611

You are close, only assign mask to new column:

df['F'] = (df > 0).sum(axis=1) >= 2

Or:

df['F'] = np.count_nonzero(df, axis=1) >= 2
print (df)
   A  B  C  D  E      F
a  0  2  0  8  0   True
b  3  0  0  0  0  False
c  6  2  5  0  0   True

Upvotes: 2

Related Questions