Python/DataFrame: Calculate percentage of occurrences/rows when value is greater than zero

Given the following DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.uniform(-1,1,size=(6, 2)), columns=list('AB'))


          A         B
0  0.179713  0.341367
1 -0.439868  0.999864
2 -0.253476 -0.816107
3 -0.829449 -0.562657
4  0.174300  0.055969
5  0.922375  0.987108

How can I calculate the percentage of rows/entries that are greater than 0 for a specific column and only return the float value?

The following code returns a Series where the output for A overwrites the output for B.

a = df[df['A'] > 0].count()/df['A'].count()

A    0.5
B    0.5
dtype: float64

However, the desired output is only a single float value and not a Series.

Desired output:
    0.5

Upvotes: 1

Views: 5093

Answers (2)

Taranta
Taranta

Reputation: 35

I would just add a new column that takes the value True when your condition is met. We can then take the mean of this column of Boolean values.

df['check'] = df.A > 0  # this creates the new column

# This returns the single value for the percentage you are looking for. 
df.check.mean()
# In Python 3, the mean will be a float

Or, if we just want the value, we can do it in one line without creating a new column.

(df.A > 0).mean()

Upvotes: 3

BENY
BENY

Reputation: 323276

You can use loc , since the previous code return a dataframe count, in your case you need series

a = df.loc[df['A'] > 0,'A'].count()/df['A'].count()
a
Out[58]: 0.5

Upvotes: 6

Related Questions