Reputation: 179
I have a dataset that contains two columns. One is the Message Received Timestamp
and another is the Message Sent Timestamp
. The format of the time look like such 2021-06-05T07:00:34Z
. I would like to calculate the time difference in seconds between those two columns and make sure the difference is less than 24 hours
by excluding business days.
I am aware of the np.busday_count
method but I would like to calculate the difference in seconds. Please see my current code below which fails to consider the business days: (pd.to_datetime(df["Message Received Timestamp"]) - pd.to_datetime(df["Message Sent Timestamp"])) < pd.Timedelta("1 day")
I would really appreciate your help to factor in the business days calculation (in seconds). Thank you so much.
Upvotes: 1
Views: 484
Reputation: 23227
You can use Timedelta.total_seconds()
to get the total time difference in seconds between the 2 dates as follows:
df["Diff_Total_Seconds"] = (pd.to_datetime(df["Message Received Timestamp"]) - pd.to_datetime(df["Message Sent Timestamp"])).dt.total_seconds()
Then calculate the total seconds for business days by multiplying np.busday_count
with 86400 (seconds in one day):
df["Busday_Total_Seconds"] = (np.busday_count(pd.to_datetime(df["Message Sent Timestamp"]).values.astype('datetime64[D]'), pd.to_datetime(df["Message Received Timestamp"]).values.astype('datetime64[D]')) - 1) * 86400
Finally, get the time difference in seconds excluding business days as their difference.
df["Diff_Total_Seconds_Excl_Busday"] = df["Diff_Total_Seconds"] - df["Busday_Total_Seconds"]
Test Result:
Message Received Timestamp Message Sent Timestamp Diff_Total_Seconds Busday_Total_Seconds Diff_Total_Seconds_Excl_Busday
0 2021-06-07T06:41:01Z 2021-06-04T07:00:34Z 258027.0 0 258027.0
1 2021-06-09T06:41:01Z 2021-06-08T07:00:34Z 85227.0 0 85227.0
After further discussion with OP, we come up with updated codes with handling of some special cases.
df["Message Received Timestamp"] = pd.to_datetime(df["Message Received Timestamp"])
df["Message Sent Timestamp"] = pd.to_datetime(df["Message Sent Timestamp"])
df["Diff_Total_Sec"] = (df["Message Received Timestamp"] - df["Message Sent Timestamp"]).dt.total_seconds()
df["Diff_Days"] = (df["Message Received Timestamp"].dt.normalize() - df["Message Sent Timestamp"].dt.normalize()).dt.days
df["NonBDay_Total_Sec"] = (df["Diff_Days"] - np.busday_count(df["Message Sent Timestamp"].values.astype('datetime64[D]'), df["Message Received Timestamp"].values.astype('datetime64[D]'))) * 86400
df["NonBDay_Total_Sec"] = np.where((df["Message Received Timestamp"].dt.dayofweek == 6) & (df["NonBDay_Total_Sec"] < 86400 * 2), 86400 * 2, df["NonBDay_Total_Sec"])
df["NonBDay_Total_Sec"] = np.where((df["Message Received Timestamp"].dt.dayofweek == 5) & (df["NonBDay_Total_Sec"] < 86400), 86400, df["NonBDay_Total_Sec"])
df["Diff_Total_Sec_Excl_NonBDay"] = df["Diff_Total_Sec"] - df["NonBDay_Total_Sec"]
df["Validity_Result"] = (df["Diff_Total_Sec_Excl_NonBDay"] < 86400) & (df["Message Sent Timestamp"] <= df["Message Received Timestamp"])
Test Result:
Message Received Timestamp Message Sent Timestamp Diff_Total_Sec Diff_Days NonBDay_Total_Sec Diff_Total_Sec_Excl_NonBDay Validity_Result
0 2021-06-07 06:41:01+00:00 2021-06-04 07:00:34+00:00 258027.0 3 172800 85227.0 True
1 2021-06-07 06:41:01+00:00 2021-06-07 05:41:01+00:00 3600.0 0 0 3600.0 True
2 2021-06-07 06:41:01+00:00 2021-06-07 07:41:01+00:00 -3600.0 0 0 -3600.0 False
3 2021-06-07 06:41:01+00:00 2021-06-03 06:41:01+00:00 345600.0 4 172800 172800.0 False
4 2021-06-07 06:41:01+00:00 2021-06-10 06:41:01+00:00 -259200.0 -3 0 -259200.0 False
5 2021-06-07 06:41:01+00:00 2021-06-03 07:00:01+00:00 344460.0 4 172800 171660.0 False
6 2021-06-07 06:41:01+00:00 2021-06-04 07:00:01+00:00 258060.0 3 172800 85260.0 True
7 2021-06-07 06:41:01+00:00 2021-06-04 05:00:01+00:00 265260.0 3 172800 92460.0 False
8 2021-06-06 22:01:03+00:00 2021-06-04 09:38:38+00:00 217345.0 2 172800 44545.0 True
9 2021-06-05 07:01:36+00:00 2021-06-04 03:49:42+00:00 97914.0 1 86400 11514.0 True
10 2021-06-07 01:10:09+00:00 2021-06-03 03:32:06+00:00 337083.0 4 172800 164283.0 False
11 2021-06-06 09:03:34+00:00 2021-06-04 08:36:36+00:00 174418.0 2 172800 1618.0 True
Upvotes: 1