Reputation: 3594
I am using a similar question on SO, except that I need to limit the range of rows in the Pandas dataframe Count occurrences of False or True in a column in pandas
So using this example, I would like to count the number of False occurrences in rows that start with values between 50000 and 80000.
I would need to first sort by values in Column 0,
then find the range of rows that is between 50000 and 80000,
then count the number of false occurrences for that limited range.
The table is below:
patient_id test_result has_cancer
0 79452 Negative False
1 81667 Positive True
2 76297 Negative False
3 36593 Negative False
4 53717 Negative False
5 67134 Negative False
6 40436 Negative False
Upvotes: 1
Views: 151
Reputation: 311
I will assume the data you presented is in a variable called "df".
import pandas as pd
# 0
df = pd.DataFrame()
# 1
df = df[(df['patient_id'] >= 50000) & (df['patient_id'] <= 50000)]
# 2 retrieves a series object
df['has_cancer'].value_counts()
# 3 retrieving the actual number
df[~df['has_cancer']].shape[0]
0: if stored in csv, used pd.read_csv() or look up how to read your specific file type with pandas
1: the first line with patient id just extracts what you specified: patient_id in your interval (assuming ids are int types)
2: We take the series "has_cancer" and counts the number of occurences of each entry (True or False in this case)
No need to sort given these solutions but if necessary just call df.sort_values([columns we want to sort by]) and pass a list of the column name(s) you want to sort by.
Upvotes: 1