Mich
Mich

Reputation: 3594

Count occurence of True or False for a range of rows in Pandas

I am using a similar question on SO, except that I need to limit the range of rows in the Pandas dataframe Count occurrences of False or True in a column in pandas

So using this example, I would like to count the number of False occurrences in rows that start with values between 50000 and 80000.

I would need to first sort by values in Column 0,

then find the range of rows that is between 50000 and 80000,

then count the number of false occurrences for that limited range.

The table is below:

patient_id  test_result has_cancer
0   79452   Negative    False
1   81667   Positive    True
2   76297   Negative    False
3   36593   Negative    False
4   53717   Negative    False
5   67134   Negative    False
6   40436   Negative    False

Upvotes: 1

Views: 151

Answers (1)

finman69
finman69

Reputation: 311

I will assume the data you presented is in a variable called "df".

import pandas as pd
# 0
df = pd.DataFrame() 

# 1
df = df[(df['patient_id'] >= 50000) & (df['patient_id'] <= 50000)]

# 2 retrieves a series object
df['has_cancer'].value_counts()

# 3 retrieving the actual number
df[~df['has_cancer']].shape[0]

0: if stored in csv, used pd.read_csv() or look up how to read your specific file type with pandas

1: the first line with patient id just extracts what you specified: patient_id in your interval (assuming ids are int types)

2: We take the series "has_cancer" and counts the number of occurences of each entry (True or False in this case)

  1. we grab the series of "has_cancer" which is already boolean, so when passed in the df[...], we would only get the values that are True. So, we use the ~ to negate the values and only get the values that are False. calling df.shape gives you a tuple that is (# rows, # columns), so we grab the item in the first index which is the number of rows.

No need to sort given these solutions but if necessary just call df.sort_values([columns we want to sort by]) and pass a list of the column name(s) you want to sort by.

Upvotes: 1

Related Questions