Reputation: 169
I'm collecting heart rate data across time for multiple subjects. Different events occur during the course of the data collection, so the start of each event is recorded elsewhere. Each event would have started at at a slightly different time for each subject. I would like to bridge the information between the two data frames so that I can know the mean heart rate of the different subjects during each chunk of time marked as an event. How can I get the mean heart rates between certain time points that are marked as events in another data frame? For instance, how can I find the mean heart rate between event 2 and event 3?
import pandas as pd
import numpy as np
#example
example_g = [["4/20/21 4:20", 302, 0, 1, 2, 3, 4, 5],
["2/17/21 9:20",135, 1, 1.4, 1.8, 2, 8, 10],
["2/17/21 9:20", 111, 4, 5, 5.1, 5.2, 5.3, 5.4]]
example_g_table = pd.DataFrame(example_g,columns=['Date_Time','CID', 0, 1, 2, 3, 4, 5])
#Example Timestamps
example_s = [["4/20/21 4:20",302,0, 2, 3],
["2/17/21 9:20",135,0, 1, 4 ],
["2/17/21 9:20",111,3, 4, 5 ]]
example_s_table = pd.DataFrame(example_s,columns=['Date_Time','CID', "event_1", "event_2", "event_3"])
desired_outcome = [["4/20/21 4:20",302,2.5],
["2/17/21 9:20",135, 3.3 ],
["2/17/21 9:20",111, 5.35 ]]
desired_outcome_table = pd.DataFrame(desired_outcome,columns=['Date_Time','CID', "Average of data between Event 2 and Event 3"])
Upvotes: 0
Views: 223
Reputation: 3133
I was able to put together a function that I think works for this, but assumes that columns don't change orders or more get added. If there would be changes to the df shape, this would need to be updated for that.
First, I merged together your example_g_table
and example_s_table
to get them all together.
df = pd.merge(left=example_g_table,right=example_s_table,on=['Date_Time','CID'],how='left')
Date_Time CID 0 1 2 3 4 5 event_1 event_2 event_3
0 4/20/21 4:20 302 0 1.0 2.0 3.0 4.0 5.0 0 2 3
1 2/17/21 9:20 135 1 1.4 1.8 2.0 8.0 10.0 0 1 4
2 2/17/21 9:20 111 4 5.0 5.1 5.2 5.3 5.4 3 4 5
Now we use a new function that will pull out the values of event_2
and event_3
, and return the average of the values of those previous column-values. We will later run df.apply
on this, so it will take in just a row at a time, as a series (I think, anyway).
def func(df):
event_2 = df['event_2']
event_3 = df['event_3']
start = int(event_2 + 2) # this assumes that the column called 0 will be the third (and starting at 0, it'll be the called 2), column 1 will be the third column, etc
end = int(event_3 + 2) # same as above
total = sum(df.iloc[start:end+1]) # this line is the key. It takes the sum of the values of columns in the range of start to finish
avg = total/(end-start+1) #(end-start+1) gets the count of things in our range
return avg
Last, we run df.apply
on this to get our new column.
df['avg'] = df.apply(func,axis=1)
df
Date_Time CID 0 1 2 3 4 5 event_1 event_2 event_3 avg
0 4/20/21 4:20 302 0 1.0 2.0 3.0 4.0 5.0 0 2 3 2.50
1 2/17/21 9:20 135 1 1.4 1.8 2.0 8.0 10.0 0 1 4 3.30
2 2/17/21 9:20 111 4 5.0 5.1 5.2 5.3 5.4 3 4 5 5.35
Upvotes: 1