Reputation: 8247
I have following dataframe in pandas
code srt_date srt_time end_time fina_datetime
123 2019-01-01 23:23:00 00:12:00 2019-01-02 00:13:00
123 2019-01-02 00:13:00 00:14:00 2019-01-02 00:15:00
123 2019-01-02 23:00:00 00:15:00 2019-01-03 00:16:00
I want to calculate fina_datetime
- end_time
for which I am doing following thing in pandas
df['end_time'] = df['srt_date'].map(str) +" "+ df['end_time'].map(str)
df['end_time'] = pd.to_datetime(df['end_time'], format = "%Y-%m-%d %H:%M:%S")
df['latency_in_secs'] = [x-y for x, y in zip(df['final_datetime'] , df['end_time'])]
df['latency_in_secs'] = df.latency_in_secs.dt.total_seconds()
Above code has issues when date is entering into next date e.g. 1st and 3rd row. How do I do it in pandas?
My desired dataframe would be
code srt_date srt_time end_time fina_datetime latency_in_secs
123 2019-01-01 23:23:00 00:12:00 2019-01-02 00:13:00 60
123 2019-01-02 00:13:00 00:14:00 2019-01-02 00:15:00 60
123 2019-01-02 23:00:00 00.15:00 2019-01-03 00:16:00 60
Upvotes: 1
Views: 60
Reputation: 150825
IIUC, you can mask where the end_time < srt_time
and add the date by one:
# convert to timedelta
df['srt_time'] = pd.to_timedelta(df['srt_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
# convert to datetime
df['srt_date'] = pd.to_datetime(df['srt_date'])
df['fina_datetime'] = pd.to_datetime(df['fina_datetime'])
# the normal end
end_dates = df['srt_date'] + df['end_time']
# increase the end time with end_time < srt_time by one day
end_dates.loc[df['end_time'].le(df['srt_time'])] += pd.to_timedelta(1, unit='D')
# substract:
df['latency_in_secs'] = (df['fina_datetime'].sub(end_dates)
.dt.total_seconds()
)
Output:
code srt_date srt_time end_time fina_datetime latency_in_secs
0 123 2019-01-01 23:23:00 00:12:00 2019-01-02 00:13:00 60.0
1 123 2019-01-02 00:13:00 00:14:00 2019-01-02 00:15:00 60.0
2 123 2019-01-02 23:00:00 00:15:00 2019-01-03 00:16:00 60.0
Upvotes: 1