Reputation: 1010
Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?
Doing something like
df[df["boolean_column"]==False]["boolean_column"].sum()
Will not work because False has a value of 0, hence a sum of zeroes will always return 0.
Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.
Upvotes: 24
Views: 73362
Reputation: 1680
Here is an attempt to be as literal and brief as possible in providing an answer. The value_counts()
strategies are probably more flexible at the end. Accumulation sum
and counting count
are different and each expressive of an analytical intent, sum
being dependent on the type of data.
"Count occurences of True/False in column of dataframe"
import pd
df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
df[df==True].count()
#boolean_column 3
#dtype: int64
df[df!=False].count()
#boolean_column 3
#dtype: int64
df[df==False].count()
#boolean_column 2
#dtype: int64
Upvotes: 0
Reputation: 31
I couldn't find here what I exactly need. I needed the number of True and False occurrences for further calculations, so I used:
true_count = (df['column']).value_counts()[True]
False_count = (df['column']).value_counts()[False]
Where df is your DataFrame and column is the column with booleans.
Upvotes: 3
Reputation: 11
df.isnull()
returns a boolean value. True
indicates a missing value.
df.isnull().sum()
returns column wise sum of True
values.
df.isnull().sum().sum()
returns total no of NA elements.
Upvotes: 1
Reputation: 50
In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):
(df['col']<=value).value_counts()
the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:
(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues
Upvotes: 0
Reputation: 893
This alternative works for multiple columns and/or rows as well.
df[df==True].count(axis=0)
Will get you the total amount of True
values per column. For row-wise count, set axis=1
.
df[df==True].count().sum()
Adding a sum()
in the end will get you the total amount in the entire DataFrame.
Upvotes: 2
Reputation: 2724
>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True 3
False 2
Name: boolean_column, dtype: int64
If you want to count False
and True
separately you can use pd.Series.sum()
+ ~
:
>> df['boolean_column'].values.sum() # True
3
>> (~df['boolean_column']).values.sum() # False
2
Upvotes: 50
Reputation: 164843
With Pandas, the natural way is using value_counts
:
df = pd.DataFrame({'A': [True, False, True, False, True]})
print(df['A'].value_counts())
# True 3
# False 2
# Name: A, dtype: int64
To calculate True
or False
values separately, don't compare against True
/ False
explicitly, just sum
and take the reverse Boolean via ~
to count False
values:
print(df['A'].sum()) # 3
print((~df['A']).sum()) # 2
This works because bool
is a subclass of int
, and the behaviour also holds true for Pandas series / NumPy arrays.
Alternatively, you can calculate counts using NumPy:
print(np.unique(df['A'], return_counts=True))
# (array([False, True], dtype=bool), array([2, 3], dtype=int64))
Upvotes: 10
Reputation: 623
You could simply sum:
sum(df["boolean_column"])
This will find the number of "True" elements.
len(df["boolean_column"]) - sum(df["boolean_column"])
Will yield the number of "False" elements.
Upvotes: 1