Reputation: 177
I have a dataframe:
df = pd.DataFrame(np.random.randn(10, 3), columns=list('XYZ'))
df.insert(0, 'NAME', pd.Series(list('ABCDEFGHIJ')))
and would like to have the count of positive entries in specified columns ('X', 'Y', 'Z'
) as a new column to the dataframe.
What's the best way of doing this?
Upvotes: 3
Views: 3960
Reputation: 5215
Here's one way to do it:
df['COUNT'] = df.select_dtypes(include='float64').gt(0).sum(axis=1)
# NAME X Y Z COUNT
# 0 A -0.033066 -1.064625 -0.299286 0
# 1 B 0.902976 -1.703256 -0.011417 1
# 2 C -2.537364 -0.216643 1.051398 1
# 3 D 1.073677 -1.486599 -0.827829 1
# 4 E 2.157901 0.425371 -1.659263 2
# 5 F -1.589662 -0.382535 0.454324 1
# 6 G 0.487965 0.279265 0.820486 3
# 7 H 0.496104 -0.680161 0.763793 2
# 8 I -0.034518 -0.479307 -0.071954 0
# 9 J -0.170412 0.558505 -1.742784 1
The select_dtypes
method is pretty self-explanatory, but it's useful in cases like this for filtering to columns of a certain dtype without needing to worry about column names.
The .gt
method (documentation) tests Series values for being greater than the argument value (in this case 0
), and returns boolean values. We can then calculate the row-wise sum of True values to get the count of values meeting our criterion.
Upvotes: 9
Reputation: 177
I think I found a solution, so I'm posting here for future reference.
np.random.seed(11)
df = pd.DataFrame(np.random.randn(10, 3), columns=list('XYZ'))
df.insert(0, 'NAME', pd.Series(list('ABCDEFGHIJ')))
cols = df.columns.difference(['NAME'])
df['COUNT'] = df[df[cols] > 0].count(axis=1)
Upvotes: 1