Reputation: 39
I am working with a dataframe in pandas. My dataframe had 55 columns and 70.000 rows.
How can I select the rows where two or more values are bigger than 0?
It now looks like this:
A B C D E
a 0 2 0 8 0
b 3 0 0 0 0
c 6 2 5 0 0
And would like to make this:
A B C D E F
a 0 2 0 8 0 true
b 3 0 0 0 0 false
c 6 2 5 0 0 true
Have tried converting it to just 0's and 1's and summing that, like so:
df[df > 0] = 1
df[(df > 0).sum(axis=1) >= 2]
But then I lose all the other info in the dataframe and I still want to be able to see the original values.
Upvotes: 1
Views: 1611
Reputation: 71570
Try assigning to a column like this:
>>> df['F'] = df.gt(0).sum(axis=1).ge(2)
>>> df
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True
Or try with astype(bool)
:
>>> df['F'] = df.astype(bool).sum(axis=1).ge(2)
>>> df
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True
>>>
Upvotes: 2
Reputation: 862611
You are close, only assign mask to new column:
df['F'] = (df > 0).sum(axis=1) >= 2
Or:
df['F'] = np.count_nonzero(df, axis=1) >= 2
print (df)
A B C D E F
a 0 2 0 8 0 True
b 3 0 0 0 0 False
c 6 2 5 0 0 True
Upvotes: 2