Reputation: 3148
Have a tricky question:
There are two dataframes 'TimeRanges' where the information about ranges of time is in (start date and end date with ID) like this:
ID StartTime EndTime
1 01.03.18 12:00 01.03.18 13:00
2 01.03.18 13:00 01.03.18 13:15
3 01.03.18 13:30 01.03.18 14:55
The second dataframe contains Time column with the time values increasing with frequency of one minute and the column Values like this:
Time Value
01.03.18 12:00 5.00
01.03.18 12:01 20.00
01.03.18 12:02 5.00
01.03.18 13:10 30.00
01.03.18 14:20 45.00
What I try to achieve and don't know how to come closer to the task is that I want to create new column AvgValue in dataframe TimeRanges which will contain mean() function of the Values those Times are in the interval between StartTime and EndTime for example:
ID StartTime EndTime AvgValue
1 01.03.18 12:00 01.03.18 13:00 10
2 01.03.18 13:00 01.03.18 13:15 30
3 01.03.18 13:30 01.03.18 14:55 45
*The value 10 because it's in the interval from 01.03.18 12:00 to 01.03.18 13:00 (01.03.18 12:00, 01.03.18 12:01, 01.03.18 12:02) and because of that we calculate the mean only for these values.
What will be approach to do that? lambdas functions? or smth else?
Thanks
Upvotes: 1
Views: 369
Reputation: 28644
One option is with the conditional_join from pyjanitor.
# pip install pyjanitor
import pandas as pd
import janitor
(df2
.conditional_join(
df1,
('Time', 'StartTime', '>='),
('Time', 'EndTime', '<='))
.groupby(['ID', 'StartTime', 'EndTime'], as_index = False)
.Value
.mean()
)
ID StartTime EndTime Value
0 1 2018-01-03 12:00:00 2018-01-03 13:00:00 10.0
1 2 2018-01-03 13:00:00 2018-01-03 13:15:00 30.0
2 3 2018-01-03 13:30:00 2018-01-03 14:55:00 45.0
Upvotes: 0
Reputation: 1431
I achieved this with resample
but it also requires a bit of fiddling so it might not be the best solution. First of all we need to the index to be of type DatetimeIndex, TimedeltaIndex or PeriodIndex.
# set Time to be index
df.set_index('Time', inplace=True)
# change index type to datetime
df.index = pd.to_datetime(df.index)
using resample
- I have used the rule of 60 minutes. You can look at the rules in the follwoing link resample
new_df = df.resample('60T').mean().reset_index()
now we have a new_df
with average values every 60 minutes. we just need to do the following to have it in the format you want.
from datetime import timedelta
new_df['EndTime'] = new_df['Time'] + timedelta(seconds=3600)
and finally renaming the columns:
new_df.rename(columns={'Time': 'StartTime', 'Value': 'AvgValue'}, inplace=True)
Output:
StartTime AvgValue EndTime
0 2018-01-03 12:00:00 10.0 2018-01-03 13:00:00
1 2018-01-03 13:00:00 30.0 2018-01-03 14:00:00
2 2018-01-03 14:00:00 45.0 2018-01-03 15:00:00
EDIT: This time using the first dataframe (df1) for the time ranges you can do the following
df1['AvgTime'] = df1.T.apply(lambda x: df.loc[x['StartTime']:x['EndTime']].mean()).T
Upvotes: 1