Reputation: 95
I have the following dataframe:
data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}
df_data = pd.DataFrame(data)
df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S')
I have been trying to calculate the short time difference between the orders each 15 minutes, e.g.
I take a time window 15 minutes
and take only its half 7:30
which means I would like to calculate the difference between the first order '2019-01-01 0:00:00'
and 00:07:30
and between the second order '2019-01-01 0:11:00'
and 00:07:30
and take only the order that is closer to 00:07:30
each day.
I did the following:
t = 0
x = '00:00:00'
y = '00:15:00'
g = 0
a = []
for i in range(1, len(df_data)):
g +=1
half_time = (pd.Timestamp(y) - pd.Timstamp(x).to_pydatetime()) / 2
half_window = (half_time + pd.Timestamp(x).to_pydatetime()).strftime('%H:%M:%S')
for l in df_data['day_order']:
for k in df_data['time_order']:
if l == k.strftime('%Y-%m-%d')
distance1 = abs(pd.Timestamp(df_data.iat[i-1, 4].to_pydatetime() - pd.Timestamp(half_window).to_pydatetime())
distance2 = abs(pd.Timestamp(df_data.iat[i, 4].to_pydatetime() - pd.Timestamp(half_window).to_pydatetime())
if distance1 < distance2:
d = distance1
else:
d = distance2
a.append(d.seconds)
so the expected result for the first day is abs(00:11:00 - 00:07:30) = 00:03:30
which is less than abs(00:00:00 - 00:07:30) = 00:07:30
and by doing so I would like to consider only the short time distance which means the 00:03:30
and ignor the first order at that day. I would like to do it for each day. I tried it with my code above, it doesn't work. Any idea would be very appreciated. Thanks in advance.
Upvotes: 2
Views: 102
Reputation: 2430
I am not sure about the format of the expected output, but I would try to bring the result to a point where you can extract data as you like:
Loading given data:
import pandas as pd
data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}
df_data = pd.DataFrame(data)
df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S')
Calculating difference:
x = '00:00:00'
y = '00:15:00'
diff = (pd.Timedelta(y)-pd.Timedelta(x))/2
Creating a new column 'diff' as timedelta:
df_data['diff'] = abs(df_data['time'] - diff)
Grouping (based on date) and apply:
mins = df_data.groupby('day_order').apply(lambda x: x[x['diff']==min(x['diff'])])
Removing Index (optional):
mins.reset_index(drop=True, inplace=True)
Output DataFrame:
>>> mins
id time_order day_order time diff
0 0 2019-01-01 00:11:00 2019-01-01 00:11:00 0 days 00:03:30
1 0 2019-01-02 00:04:00 2019-01-02 00:04:00 0 days 00:03:30
2 0 2019-01-03 00:07:00 2019-01-03 00:07:00 0 days 00:00:30
Making list of difference in seconds:
a = list(mins['diff'].apply(lambda x:x.seconds))
Output:
>>> a
[210, 210, 30]
Upvotes: 1