Ali
Ali

Reputation: 177

Dataframe count of positive values in range as a new column

I have a dataframe:

df = pd.DataFrame(np.random.randn(10, 3), columns=list('XYZ'))
df.insert(0, 'NAME', pd.Series(list('ABCDEFGHIJ')))

and would like to have the count of positive entries in specified columns ('X', 'Y', 'Z') as a new column to the dataframe.

What's the best way of doing this?

Upvotes: 3

Views: 3960

Answers (2)

cmaher
cmaher

Reputation: 5215

Here's one way to do it:

df['COUNT'] = df.select_dtypes(include='float64').gt(0).sum(axis=1)
#  NAME         X         Y         Z  COUNT
# 0    A -0.033066 -1.064625 -0.299286      0
# 1    B  0.902976 -1.703256 -0.011417      1
# 2    C -2.537364 -0.216643  1.051398      1
# 3    D  1.073677 -1.486599 -0.827829      1
# 4    E  2.157901  0.425371 -1.659263      2
# 5    F -1.589662 -0.382535  0.454324      1
# 6    G  0.487965  0.279265  0.820486      3
# 7    H  0.496104 -0.680161  0.763793      2
# 8    I -0.034518 -0.479307 -0.071954      0
# 9    J -0.170412  0.558505 -1.742784      1

The select_dtypes method is pretty self-explanatory, but it's useful in cases like this for filtering to columns of a certain dtype without needing to worry about column names.

The .gt method (documentation) tests Series values for being greater than the argument value (in this case 0), and returns boolean values. We can then calculate the row-wise sum of True values to get the count of values meeting our criterion.

Upvotes: 9

Ali
Ali

Reputation: 177

I think I found a solution, so I'm posting here for future reference.

np.random.seed(11)

df = pd.DataFrame(np.random.randn(10, 3), columns=list('XYZ'))
df.insert(0, 'NAME', pd.Series(list('ABCDEFGHIJ')))

cols = df.columns.difference(['NAME'])
df['COUNT'] = df[df[cols] > 0].count(axis=1)

Upvotes: 1

Related Questions