Joining Pandas dataframes based on time intervals and calculating average values

Question

Have a tricky question:

There are two dataframes 'TimeRanges' where the information about ranges of time is in (start date and end date with ID) like this:

ID  StartTime       EndTime
1   01.03.18 12:00  01.03.18 13:00 
2   01.03.18 13:00  01.03.18 13:15 
3   01.03.18 13:30  01.03.18 14:55

The second dataframe contains Time column with the time values increasing with frequency of one minute and the column Values like this:

Time            Value
01.03.18 12:00  5.00
01.03.18 12:01  20.00
01.03.18 12:02  5.00
01.03.18 13:10  30.00
01.03.18 14:20  45.00

What I try to achieve and don't know how to come closer to the task is that I want to create new column AvgValue in dataframe TimeRanges which will contain mean() function of the Values those Times are in the interval between StartTime and EndTime for example:

ID  StartTime       EndTime         AvgValue
1   01.03.18 12:00  01.03.18 13:00     10
2   01.03.18 13:00  01.03.18 13:15     30
3   01.03.18 13:30  01.03.18 14:55     45

*The value 10 because it's in the interval from 01.03.18 12:00 to 01.03.18 13:00 (01.03.18 12:00, 01.03.18 12:01, 01.03.18 12:02) and because of that we calculate the mean only for these values.

What will be approach to do that? lambdas functions? or smth else?

Thanks

gyx-hh · Accepted Answer

I achieved this with resample but it also requires a bit of fiddling so it might not be the best solution. First of all we need to the index to be of type DatetimeIndex, TimedeltaIndex or PeriodIndex.

# set Time to be index
df.set_index('Time', inplace=True)
# change index type to datetime
df.index = pd.to_datetime(df.index)

using resample - I have used the rule of 60 minutes. You can look at the rules in the follwoing link resample

new_df = df.resample('60T').mean().reset_index()

now we have a new_df with average values every 60 minutes. we just need to do the following to have it in the format you want.

from datetime import timedelta    
new_df['EndTime'] = new_df['Time'] + timedelta(seconds=3600)

and finally renaming the columns:

new_df.rename(columns={'Time': 'StartTime', 'Value': 'AvgValue'}, inplace=True)

Output:

    StartTime             AvgValue    EndTime
0   2018-01-03 12:00:00   10.0        2018-01-03 13:00:00
1   2018-01-03 13:00:00   30.0        2018-01-03 14:00:00
2   2018-01-03 14:00:00   45.0        2018-01-03 15:00:00

EDIT: This time using the first dataframe (df1) for the time ranges you can do the following

df1['AvgTime'] = df1.T.apply(lambda x: df.loc[x['StartTime']:x['EndTime']].mean()).T

Joining Pandas dataframes based on time intervals and calculating average values

Answers (2)

Related Questions