Reputation: 698
Given the following DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.uniform(-1,1,size=(6, 2)), columns=list('AB'))
A B
0 0.179713 0.341367
1 -0.439868 0.999864
2 -0.253476 -0.816107
3 -0.829449 -0.562657
4 0.174300 0.055969
5 0.922375 0.987108
How can I calculate the percentage of rows/entries that are greater than 0 for a specific column and only return the float value?
The following code returns a Series where the output for A overwrites the output for B.
a = df[df['A'] > 0].count()/df['A'].count()
A 0.5
B 0.5
dtype: float64
However, the desired output is only a single float value and not a Series.
Desired output:
0.5
Upvotes: 1
Views: 5093
Reputation: 35
I would just add a new column that takes the value True when your condition is met. We can then take the mean of this column of Boolean values.
df['check'] = df.A > 0 # this creates the new column
# This returns the single value for the percentage you are looking for.
df.check.mean()
# In Python 3, the mean will be a float
Or, if we just want the value, we can do it in one line without creating a new column.
(df.A > 0).mean()
Upvotes: 3
Reputation: 323276
You can use loc
, since the previous code return a dataframe count, in your case you need series
a = df.loc[df['A'] > 0,'A'].count()/df['A'].count()
a
Out[58]: 0.5
Upvotes: 6