bodokaiser
bodokaiser

Reputation: 15742

How to count conditions across a row?

I have a pandas data frame:

df = pd.DataFrame({
    'a': [1, 2, 0, 3],
    'b': [1, 2, 0, 0],
    'c': [5, 2, 0, 3],
    'd': [0, 3, 7, 1]
})

I would now like to create another column n which counts how many values of columns ['a', 'b', 'c', 'd'] are > 0.

By hand we need to do:

df['n'] = [3, 2, 3, 3]

I don't need to state that this is unhandy for larger frames. I know we can select the rows we are interested with df.a > 0, ..., df.d > 0.

Unfortunately I am not able to convert the provided bool values to 0 and 1 and sum them.

df['n'] = df. a > 0 + df.b > 0 + df.c > 0 + df.d > 0

Throws

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How do I correct this?

Upvotes: 1

Views: 158

Answers (1)

EdChum
EdChum

Reputation: 393973

You could perform a list comprehension, looping over the columns and then use a boolean condition on that column, drop the values that don't meet the condition and call count:

In [360]:

[df.loc[df[col]>0,col].dropna().count() for col in df]
Out[360]:
[3, 2, 3, 3]

This would yield the column:

In [361]:

df['n'] = [df.loc[df[col]>0,col].dropna().count() for col in df]
df
Out[361]:
   a  b  c  d  n
0  1  1  5  0  3
1  2  2  2  3  2
2  0  0  0  7  3
3  3  0  3  1  3

At this stage it may make sense to label your rows to the column names so that you n makes sense.

EDIT

I realised on my way to lunch that there was a simpler method just calling count:

In [365]:

df[df>0].count()
Out[365]:
a    3
b    2
c    3
d    3
dtype: int64

Upvotes: 2

Related Questions