helloimgeorgia
helloimgeorgia

Reputation: 361

Pandas: calculate time elapsed between timestamp and current time, but only business hours and with timezone

I am trying to use Pandas to calculate the number of business seconds elapsed. I have a column in a Pandas dataframe that has a bunch of timestamps in the NY timezone. Here is the code I have so far:

import pandas as pd
import datetime

times = pd.DataFrame([datetime.datetime.now(timezone('America/New_York')),datetime.datetime.now(timezone('America/New_York'))],columns=['timestamp'])
time.sleep(2)
times['difference'] = (datetime.datetime.now(timezone('America/New_York')) - times)
times['difference'] = times['difference'].dt.seconds

This works as intended and gives the answer as 2 in the 'difference' column. But now I would like to only include business hours (say 9am to 5pm). So that the output between 5pm yesterday and 9am this morning is zero. I have read the Pandas documentation on time offsets and have looked for similar questions, but haven't found any examples that work.

Upvotes: 2

Views: 1440

Answers (2)

Whole Brain
Whole Brain

Reputation: 2167

Here is a mathematical way to do it.

Step by step

Let's investigate about working seconds for a TimeStamp some_time. We are going to need the TimeStamp of some_time at midnight. Let's call it d_day:

d_day = some_time.replace(hour=0, minute=0, second=0, microsecond=0)

Today

First, let's define our working days. It start at 9 AM and lasts until 5 PM. In seconds, it gives:

start_time = 9*3600
max_work_time = (17-9) * 3600

Now, let's get the timestamp for now and the timestamp for today at midnight, in seconds.

now = datetime.now()
today = datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
seconds_today = (now - today).seconds

To get today's seconds of work, we must substract start_time then keep at max max_work_time:

worked_seconds_today = min(seconds_today - start_time, max_work_time)

But of course, we want to keep it only if today and d-day are different days, else we want to calculate the worked_seconds since some_time:

secs_today = min(seconds_today - start_time, max_work_time) \
    if today > d_day \
    else min(seconds_today - start_time, max_work_time) - min((some_time - today).seconds - start_time, max_work_time)

Days in between d-day and today

We want full days of work only here. So let's simply do:

 inbetween_days = max((datetime.today() - d_day).days - 1, 0)

Now we can simply calculate the worked seconds of full days by doing so:

secs_inbetween_days = inbetween_days * max_work_time

D-day

Finally we also want the worked seconds of d_day since some_time. We can apply the same logic than for today. If today and some_timeare the same day, we simply put zero, as we already calculated today.

def worked_secs(x, since):
    return min((x - since).seconds - start_time, max_work_time)
secs_day_d = max_work_time - worked_secs(some_time, d_day) if today != d_day else 0

Total

The total is the sum of the three previous components:

total = secs_day_d + secs_inbetween_days + secs_today

Final function

def busy_seconds(some_time):

    # Outside the function is OK also
    start_time = 9*3600
    max_work_time = (17-9)*3600

    # We must calculate all times with the same timezone
    tz = some_time.tz

    now = datetime.now(tz=tz) # now
    today = now.replace(hour=0, minute=0, second=0, microsecond=0) # today at midnight
    d_day = some_time.replace(hour=0, minute=0, second=0, microsecond=0) # d-day at midnight

    def worked_secs(x, since): # a function is more convenient
        return min((x - since).seconds - start_time, max_work_time)

    n_complete_days = max((today - d_day).days - 1, 0)

    secs_day_d = max_work_time - worked_secs(some_time, d_day) if today != d_day else 0
    secs_inbetween_days =  max_work_time * n_complete_days
    secs_today = worked_secs(now, today) \
        if d_day < today \
        else worked_secs(now, today) - worked_secs(some_time, today)

    return  secs_day_d + secs_inbetween_days + secs_today

Finally

We can apply this function to the column:

times.timestamp.apply(busy_seconds)
# > (Example)
# 0    67420800
# 1    57340800
# 2       28800
# Name: timestamp, dtype: int64

Upvotes: 1

Paulo Schau Guerra
Paulo Schau Guerra

Reputation: 631

You can achieve this by first checking whether a given timestamp is within business hours (thanks to this thread) using Pandas BusinessHour class and then calculating the time difference or assigning a zero if the timestamp falls outside of business hours.

I have created a dummy dataset to test the code, as you can see below:

import pandas as pd
import time

# Sets the timezone
timezone = "America/New_York"

# Gets business hours from native Pandas class
biz_hours = pd.offsets.BusinessHour()

# Creates array with timestamps to test code
times_array = pd.date_range(start='2021-05-18 16:59:00', end='2021-05-18 17:01:00',
                            tz=timezone, freq='S')

# Creates DataFrame with timestamps
times = pd.DataFrame(times_array,columns=['timestamp'])

# Checks if a timestamp falls within business hours                           
times['is_biz_hour'] = times['timestamp'].apply(pd.Timestamp).apply(biz_hours.onOffset)

time.sleep(2)

# Calculates the time delta or assign zero, as per business hour condition
times['difference'] = (times.apply(lambda x: (pd.Timestamp.now(tz=timezone) - x['timestamp']).seconds
                                   if x['is_biz_hour'] else 0,
                       axis=1))

The output is not perfect at the moment, because it subtracts the timestamp from the time now, thus amounting to a large difference:

    timestamp                   is_biz_hour  difference
57  2021-05-18 16:59:57-04:00   True         71238
58  2021-05-18 16:59:58-04:00   True         71237
59  2021-05-18 16:59:59-04:00   True         71236
60  2021-05-18 17:00:00-04:00   True         71235
61  2021-05-18 17:00:01-04:00   False        0
62  2021-05-18 17:00:02-04:00   False        0
63  2021-05-18 17:00:03-04:00   False        0
64  2021-05-18 17:00:04-04:00   False        0

However, you can see that the timestamps after 5 PM have a difference of 0, whereas the others have a valid difference.

Upvotes: 5

Related Questions