Reputation: 15742
I have a pandas data frame:
df = pd.DataFrame({
'a': [1, 2, 0, 3],
'b': [1, 2, 0, 0],
'c': [5, 2, 0, 3],
'd': [0, 3, 7, 1]
})
I would now like to create another column n
which counts how many values of columns ['a', 'b', 'c', 'd']
are > 0
.
By hand we need to do:
df['n'] = [3, 2, 3, 3]
I don't need to state that this is unhandy for larger frames. I know we can select the rows we are interested with df.a > 0
, ..., df.d > 0
.
Unfortunately I am not able to convert the provided bool
values to 0
and 1
and sum them.
df['n'] = df. a > 0 + df.b > 0 + df.c > 0 + df.d > 0
Throws
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How do I correct this?
Upvotes: 1
Views: 158
Reputation: 393973
You could perform a list comprehension, looping over the columns and then use a boolean condition on that column, drop the values that don't meet the condition and call count:
In [360]:
[df.loc[df[col]>0,col].dropna().count() for col in df]
Out[360]:
[3, 2, 3, 3]
This would yield the column:
In [361]:
df['n'] = [df.loc[df[col]>0,col].dropna().count() for col in df]
df
Out[361]:
a b c d n
0 1 1 5 0 3
1 2 2 2 3 2
2 0 0 0 7 3
3 3 0 3 1 3
At this stage it may make sense to label your rows to the column names so that you n
makes sense.
EDIT
I realised on my way to lunch that there was a simpler method just calling count
:
In [365]:
df[df>0].count()
Out[365]:
a 3
b 2
c 3
d 3
dtype: int64
Upvotes: 2