ummendial
ummendial

Reputation: 93

How to aggregate time-series data over specific ranges?

I have a pandas dataframe that looks like this, whereby each row represents data collected on a different day (days 1 -> 5) for each participant (long form).

ID    Heart_Rate
1         89
1         98
1         99 
1         73 
1         54
...
24        88
24        90
24        79
24        92
24        97

How can I aggregate the data over the first 3 days for each participant such that I create a new data frame with 1 row for each patient whereby the data represents the mean heart rate over 72 hours.

Upvotes: 1

Views: 230

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can set the index of dataframe to ID then group the dataframe on level=0 and aggregate using head to select first three rows for each user ID then take mean on level=0 to get the average heart rate over the first 72 hours:

out = df.set_index('ID').groupby(level=0).head(3).mean(level=0)

Alternate approach which is more efficient but applicable only if there are always equal number of rows present corresponding to each user ID and dataframe is sorted on ID column:

n_days = 5 # Number of rows present for each user ID
n_days_to_avg = 3 # First n rows/days to average

m = np.isin(np.r_[:len(df)] % n_days, np.r_[:n_days_to_avg])
out = df[m].groupby('ID').mean()

>>> out

    Heart_Rate
ID            
1    95.333333
24   85.666667

Upvotes: 2

Related Questions