Reputation: 361
I am trying to use Pandas to calculate the number of business seconds elapsed. I have a column in a Pandas dataframe that has a bunch of timestamps in the NY timezone. Here is the code I have so far:
import pandas as pd
import datetime
times = pd.DataFrame([datetime.datetime.now(timezone('America/New_York')),datetime.datetime.now(timezone('America/New_York'))],columns=['timestamp'])
time.sleep(2)
times['difference'] = (datetime.datetime.now(timezone('America/New_York')) - times)
times['difference'] = times['difference'].dt.seconds
This works as intended and gives the answer as 2 in the 'difference' column. But now I would like to only include business hours (say 9am to 5pm). So that the output between 5pm yesterday and 9am this morning is zero. I have read the Pandas documentation on time offsets and have looked for similar questions, but haven't found any examples that work.
Upvotes: 2
Views: 1440
Reputation: 2167
Here is a mathematical way to do it.
Let's investigate about working seconds for a TimeStamp some_time
.
We are going to need the TimeStamp of some_time
at midnight. Let's call it d_day
:
d_day = some_time.replace(hour=0, minute=0, second=0, microsecond=0)
First, let's define our working days. It start at 9 AM
and lasts until 5 PM
. In seconds, it gives:
start_time = 9*3600
max_work_time = (17-9) * 3600
Now, let's get the timestamp for now and the timestamp for today at midnight, in seconds.
now = datetime.now()
today = datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
seconds_today = (now - today).seconds
To get today's seconds of work, we must substract start_time
then keep at max max_work_time
:
worked_seconds_today = min(seconds_today - start_time, max_work_time)
But of course, we want to keep it only if today
and d-day
are different days, else we want to calculate the worked_seconds since some_time
:
secs_today = min(seconds_today - start_time, max_work_time) \
if today > d_day \
else min(seconds_today - start_time, max_work_time) - min((some_time - today).seconds - start_time, max_work_time)
We want full days of work only here. So let's simply do:
inbetween_days = max((datetime.today() - d_day).days - 1, 0)
Now we can simply calculate the worked seconds of full days by doing so:
secs_inbetween_days = inbetween_days * max_work_time
Finally we also want the worked seconds of d_day since some_time
. We can apply the same logic than for today. If today and some_time
are the same day, we simply put zero, as we already calculated today.
def worked_secs(x, since):
return min((x - since).seconds - start_time, max_work_time)
secs_day_d = max_work_time - worked_secs(some_time, d_day) if today != d_day else 0
The total is the sum of the three previous components:
total = secs_day_d + secs_inbetween_days + secs_today
def busy_seconds(some_time):
# Outside the function is OK also
start_time = 9*3600
max_work_time = (17-9)*3600
# We must calculate all times with the same timezone
tz = some_time.tz
now = datetime.now(tz=tz) # now
today = now.replace(hour=0, minute=0, second=0, microsecond=0) # today at midnight
d_day = some_time.replace(hour=0, minute=0, second=0, microsecond=0) # d-day at midnight
def worked_secs(x, since): # a function is more convenient
return min((x - since).seconds - start_time, max_work_time)
n_complete_days = max((today - d_day).days - 1, 0)
secs_day_d = max_work_time - worked_secs(some_time, d_day) if today != d_day else 0
secs_inbetween_days = max_work_time * n_complete_days
secs_today = worked_secs(now, today) \
if d_day < today \
else worked_secs(now, today) - worked_secs(some_time, today)
return secs_day_d + secs_inbetween_days + secs_today
We can apply this function to the column:
times.timestamp.apply(busy_seconds)
# > (Example)
# 0 67420800
# 1 57340800
# 2 28800
# Name: timestamp, dtype: int64
Upvotes: 1
Reputation: 631
You can achieve this by first checking whether a given timestamp is within business hours (thanks to this thread) using Pandas BusinessHour class and then calculating the time difference or assigning a zero if the timestamp falls outside of business hours.
I have created a dummy dataset to test the code, as you can see below:
import pandas as pd
import time
# Sets the timezone
timezone = "America/New_York"
# Gets business hours from native Pandas class
biz_hours = pd.offsets.BusinessHour()
# Creates array with timestamps to test code
times_array = pd.date_range(start='2021-05-18 16:59:00', end='2021-05-18 17:01:00',
tz=timezone, freq='S')
# Creates DataFrame with timestamps
times = pd.DataFrame(times_array,columns=['timestamp'])
# Checks if a timestamp falls within business hours
times['is_biz_hour'] = times['timestamp'].apply(pd.Timestamp).apply(biz_hours.onOffset)
time.sleep(2)
# Calculates the time delta or assign zero, as per business hour condition
times['difference'] = (times.apply(lambda x: (pd.Timestamp.now(tz=timezone) - x['timestamp']).seconds
if x['is_biz_hour'] else 0,
axis=1))
The output is not perfect at the moment, because it subtracts the timestamp from the time now, thus amounting to a large difference:
timestamp is_biz_hour difference
57 2021-05-18 16:59:57-04:00 True 71238
58 2021-05-18 16:59:58-04:00 True 71237
59 2021-05-18 16:59:59-04:00 True 71236
60 2021-05-18 17:00:00-04:00 True 71235
61 2021-05-18 17:00:01-04:00 False 0
62 2021-05-18 17:00:02-04:00 False 0
63 2021-05-18 17:00:03-04:00 False 0
64 2021-05-18 17:00:04-04:00 False 0
However, you can see that the timestamps after 5 PM have a difference of 0, whereas the others have a valid difference.
Upvotes: 5