Reputation: 93
I have a pandas dataframe that looks like this, whereby each row represents data collected on a different day (days 1 -> 5) for each participant (long form).
ID Heart_Rate
1 89
1 98
1 99
1 73
1 54
...
24 88
24 90
24 79
24 92
24 97
How can I aggregate the data over the first 3 days for each participant such that I create a new data frame with 1 row for each patient whereby the data represents the mean heart rate over 72 hours.
Upvotes: 1
Views: 230
Reputation: 71689
We can set the index
of dataframe to ID
then group
the dataframe on level=0
and aggregate using head
to select first three rows for each user ID
then take mean
on level=0
to get the average heart rate over the first 72
hours:
out = df.set_index('ID').groupby(level=0).head(3).mean(level=0)
Alternate approach which is more efficient but applicable only if there are always equal number of rows present corresponding to each user ID
and dataframe is sorted on ID
column:
n_days = 5 # Number of rows present for each user ID
n_days_to_avg = 3 # First n rows/days to average
m = np.isin(np.r_[:len(df)] % n_days, np.r_[:n_days_to_avg])
out = df[m].groupby('ID').mean()
>>> out
Heart_Rate
ID
1 95.333333
24 85.666667
Upvotes: 2