Adam
Adam

Reputation: 95

Choosing the minumum distance

I have the following dataframe:

data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}

df_data = pd.DataFrame(data)

df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S') 

I have been trying to calculate the short time difference between the orders each 15 minutes, e.g. I take a time window 15 minutes and take only its half 7:30 which means I would like to calculate the difference between the first order '2019-01-01 0:00:00' and 00:07:30 and between the second order '2019-01-01 0:11:00' and 00:07:30 and take only the order that is closer to 00:07:30 each day.

I did the following:

t = 0
x = '00:00:00'
y = '00:15:00'
g = 0
a = []
for i in range(1, len(df_data)):
    g +=1
    half_time = (pd.Timestamp(y) - pd.Timstamp(x).to_pydatetime()) / 2
    half_window = (half_time + pd.Timestamp(x).to_pydatetime()).strftime('%H:%M:%S')
    for l in df_data['day_order']:
        for k in df_data['time_order']:
            if l == k.strftime('%Y-%m-%d')
                distance1 = abs(pd.Timestamp(df_data.iat[i-1, 4].to_pydatetime() - pd.Timestamp(half_window).to_pydatetime())
                distance2 = abs(pd.Timestamp(df_data.iat[i, 4].to_pydatetime() - pd.Timestamp(half_window).to_pydatetime())
                if distance1 < distance2:
                    d = distance1 
                else:
                    d = distance2
    a.append(d.seconds)

so the expected result for the first day is abs(00:11:00 - 00:07:30) = 00:03:30 which is less than abs(00:00:00 - 00:07:30) = 00:07:30 and by doing so I would like to consider only the short time distance which means the 00:03:30 and ignor the first order at that day. I would like to do it for each day. I tried it with my code above, it doesn't work. Any idea would be very appreciated. Thanks in advance.

Upvotes: 2

Views: 102

Answers (1)

Rishabh Kumar
Rishabh Kumar

Reputation: 2430

I am not sure about the format of the expected output, but I would try to bring the result to a point where you can extract data as you like:

Loading given data:

import pandas as pd
data = {'id': [0, 0, 0, 0, 0, 0],
'time_order': ['2019-01-01 0:00:00', '2019-01-01 00:11:00', '2019-01-02 00:04:00', '2019-01-02 00:15:00', '2019-01-03 00:07:00', '2019-01-03 00:10:00']}

df_data = pd.DataFrame(data)

df_data['time_order'] = pd.to_datetime(df_data['time_order'])
df_data['day_order'] = df_data['time_order'].dt.strftime('%Y-%m-%d')
df_data['time'] = df_data['time_order'].dt.strftime('%H:%M:%S') 

Calculating difference:

x = '00:00:00'
y = '00:15:00'
diff = (pd.Timedelta(y)-pd.Timedelta(x))/2

Creating a new column 'diff' as timedelta:

df_data['diff'] = abs(df_data['time'] - diff)

Grouping (based on date) and apply:

mins = df_data.groupby('day_order').apply(lambda x: x[x['diff']==min(x['diff'])])

Removing Index (optional):

mins.reset_index(drop=True, inplace=True)

Output DataFrame:

>>> mins
   id          time_order   day_order      time            diff
0   0 2019-01-01 00:11:00  2019-01-01  00:11:00 0 days 00:03:30
1   0 2019-01-02 00:04:00  2019-01-02  00:04:00 0 days 00:03:30
2   0 2019-01-03 00:07:00  2019-01-03  00:07:00 0 days 00:00:30

Making list of difference in seconds:

a = list(mins['diff'].apply(lambda x:x.seconds))

Output:

>>> a
[210, 210, 30]

Upvotes: 1

Related Questions