Pandas: calculate time elapsed between timestamp and current time, but only business hours and with timezone

Question

I am trying to use Pandas to calculate the number of business seconds elapsed. I have a column in a Pandas dataframe that has a bunch of timestamps in the NY timezone. Here is the code I have so far:

import pandas as pd
import datetime

times = pd.DataFrame([datetime.datetime.now(timezone('America/New_York')),datetime.datetime.now(timezone('America/New_York'))],columns=['timestamp'])
time.sleep(2)
times['difference'] = (datetime.datetime.now(timezone('America/New_York')) - times)
times['difference'] = times['difference'].dt.seconds

This works as intended and gives the answer as 2 in the 'difference' column. But now I would like to only include business hours (say 9am to 5pm). So that the output between 5pm yesterday and 9am this morning is zero. I have read the Pandas documentation on time offsets and have looked for similar questions, but haven't found any examples that work.

Paulo Schau Guerra · Accepted Answer

You can achieve this by first checking whether a given timestamp is within business hours (thanks to this thread) using Pandas BusinessHour class and then calculating the time difference or assigning a zero if the timestamp falls outside of business hours.

I have created a dummy dataset to test the code, as you can see below:

import pandas as pd
import time

# Sets the timezone
timezone = "America/New_York"

# Gets business hours from native Pandas class
biz_hours = pd.offsets.BusinessHour()

# Creates array with timestamps to test code
times_array = pd.date_range(start='2021-05-18 16:59:00', end='2021-05-18 17:01:00',
                            tz=timezone, freq='S')

# Creates DataFrame with timestamps
times = pd.DataFrame(times_array,columns=['timestamp'])

# Checks if a timestamp falls within business hours                           
times['is_biz_hour'] = times['timestamp'].apply(pd.Timestamp).apply(biz_hours.onOffset)

time.sleep(2)

# Calculates the time delta or assign zero, as per business hour condition
times['difference'] = (times.apply(lambda x: (pd.Timestamp.now(tz=timezone) - x['timestamp']).seconds
                                   if x['is_biz_hour'] else 0,
                       axis=1))

The output is not perfect at the moment, because it subtracts the timestamp from the time now, thus amounting to a large difference:

    timestamp                   is_biz_hour  difference
57  2021-05-18 16:59:57-04:00   True         71238
58  2021-05-18 16:59:58-04:00   True         71237
59  2021-05-18 16:59:59-04:00   True         71236
60  2021-05-18 17:00:00-04:00   True         71235
61  2021-05-18 17:00:01-04:00   False        0
62  2021-05-18 17:00:02-04:00   False        0
63  2021-05-18 17:00:03-04:00   False        0
64  2021-05-18 17:00:04-04:00   False        0

However, you can see that the timestamps after 5 PM have a difference of 0, whereas the others have a valid difference.

Pandas: calculate time elapsed between timestamp and current time, but only business hours and with timezone

Answers (2)

Step by step

Today

Days in between d-day and today

D-day

Total

Final function

Finally

Related Questions