Luca Giorgi
Luca Giorgi

Reputation: 1010

Count occurences of True/False in column of dataframe

Is there a way to count the number of occurrences of boolean values in a column without having to loop through the DataFrame?

Doing something like

df[df["boolean_column"]==False]["boolean_column"].sum()

Will not work because False has a value of 0, hence a sum of zeroes will always return 0.

Obviously you could count the occurrences by looping over the column and checking, but I wanted to know if there's a pythonic way of doing this.

Upvotes: 24

Views: 73362

Answers (8)

Rich Andrews
Rich Andrews

Reputation: 1680

Here is an attempt to be as literal and brief as possible in providing an answer. The value_counts() strategies are probably more flexible at the end. Accumulation sum and counting count are different and each expressive of an analytical intent, sum being dependent on the type of data.

"Count occurences of True/False in column of dataframe"

import pd
df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})

df[df==True].count()
#boolean_column    3
#dtype: int64

df[df!=False].count()
#boolean_column    3
#dtype: int64

df[df==False].count()
#boolean_column    2
#dtype: int64

Upvotes: 0

Petar Milinkovic
Petar Milinkovic

Reputation: 31

I couldn't find here what I exactly need. I needed the number of True and False occurrences for further calculations, so I used:

true_count = (df['column']).value_counts()[True]
False_count = (df['column']).value_counts()[False]

Where df is your DataFrame and column is the column with booleans.

Upvotes: 3

turbojet780
turbojet780

Reputation: 11

df.isnull() 

returns a boolean value. True indicates a missing value.

df.isnull().sum() 

returns column wise sum of True values.

df.isnull().sum().sum() 

returns total no of NA elements.

Upvotes: 1

Andrea Grianti
Andrea Grianti

Reputation: 50

In case you have a column in a DataFrame with boolean values, or even more interesting, in case you do not have it but you want to find the number of values in a column satisfying a certain condition you can try something like this (as an example I used <=):

(df['col']<=value).value_counts()

the parenthesis create a tuple with # of True/False values which you can use for other calcs as well, accessing the tuple adding [0] for False counts and [1] for True counts even without creating an additional variable:

(df['col']<=value).value_counts()[0] #for falses
(df['col']<=value).value_counts()[1] #for trues

Upvotes: 0

Jakob
Jakob

Reputation: 893

This alternative works for multiple columns and/or rows as well. 

df[df==True].count(axis=0)

Will get you the total amount of True values per column. For row-wise count, set axis=1

df[df==True].count().sum()

Adding a sum() in the end will get you the total amount in the entire DataFrame.

Upvotes: 2

user3471881
user3471881

Reputation: 2724

Use pd.Series.value_counts():

>> df = pd.DataFrame({'boolean_column': [True, False, True, False, True]})
>> df['boolean_column'].value_counts()
True     3
False    2
Name: boolean_column, dtype: int64

If you want to count False and True separately you can use pd.Series.sum() + ~:

>> df['boolean_column'].values.sum()  # True
3
>> (~df['boolean_column']).values.sum() # False
2

Upvotes: 50

jpp
jpp

Reputation: 164843

With Pandas, the natural way is using value_counts:

df = pd.DataFrame({'A': [True, False, True, False, True]})

print(df['A'].value_counts())

# True     3
# False    2
# Name: A, dtype: int64

To calculate True or False values separately, don't compare against True / False explicitly, just sum and take the reverse Boolean via ~ to count False values:

print(df['A'].sum())     # 3
print((~df['A']).sum())  # 2

This works because bool is a subclass of int, and the behaviour also holds true for Pandas series / NumPy arrays.

Alternatively, you can calculate counts using NumPy:

print(np.unique(df['A'], return_counts=True))

# (array([False,  True], dtype=bool), array([2, 3], dtype=int64))

Upvotes: 10

FMarazzi
FMarazzi

Reputation: 623

You could simply sum:

sum(df["boolean_column"])

This will find the number of "True" elements.

len(df["boolean_column"]) - sum(df["boolean_column"])

Will yield the number of "False" elements.

Upvotes: 1

Related Questions