Reputation: 21
Background: I am trying to use data from a csv file to make asks questions and make conclusions base on data. The data is a log of patient visits from a clinic in Brazil, including additional patient data, and whether the patient was a no show or not. I have chosen to examine correlations between the patient's age and the no show data.
Problem: Given visit number, patient ID, age, and no show data, how do I compile an array of ages that correlate with the each unique patient ID (so that I can evaluate the mean age of total unique patients visiting the clinic).
My code:
# data set of no shows at a clinic in Brazil
noshow_data = pd.read_csv('noshowappointments-kagglev2-may-2016.csv')
noshow_df = pd.DataFrame(noshow_data)
Here is the beginning of the code, with the head of the whole dataframe of the csv given
# Next I construct a dataframe with only the data I'm interested in:
ptid = noshow_df['PatientId']
ages = noshow_df['Age']
noshow = noshow_df['No-show']
ptid_ages_noshow = pd.DataFrame({'PatientId' : pt_id, 'Ages' : ages,
'No_show' : noshow})
ptid_ages_noshow
Here I have sorted the data to show the multiple visits of a unique patient
# Now, I know how to determine the total number of unique patients:
# total number of unique patients
num_unique_pts = noshow_df.PatientId.unique()
len(num_unique_pts)
If I want to find the mean age of all the patients during the course of all visits I would use:
# mean age of all vists
ages = noshow_data['Age']
ages.mean()
So my question is this, how could I find the mean age of all the unique patients?
Upvotes: 2
Views: 316
Reputation: 3867
So you only want to keep one appointment per patient for the calculation? This is how to do it:
noshow_df.drop_duplicates('PatientId')['Age'].mean()
Keep in mind that the age of people changes over time. You need to decide how you want to handle this.
Upvotes: 0
Reputation: 359
You can simply use the groupby function available in pandas
with restriction to the concerned columns :
ptid_ages_noshow[['PatientId','Ages']].groupby('PatientId').mean()
Upvotes: 1