Reputation: 621
I have a data frame like this:
datetime1 datetime2
0 2021-05-09 19:52:14 2021-05-09 20:52:14
1 2021-05-09 19:52:14 2021-05-09 21:52:14
I want to compare them and create a new column with the difference between them:
The desirable output is like below:
datetime1 datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00
EDIT:
@Andrej that solution that you gave me worked perfectly when I had timestamps in both datetime1 and2. If I have a df like below, its is failing because it has nothing to compare
df1:
datetime1 datetime2
0 2021-05-09 19:52:14 2021-05-09 20:52:14
1 2021-05-09 19:52:14 2021-05-09 21:52:14
2 NaN NaN
3 2021-05-09 16:30:14 NaN
4 NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14
df2 (desirable output):
datetime1 datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00 NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00 NaN
2 NaN NaN NaN NaN
3 2021-05-09 16:30:14 NaN NaN e.g(04:00:00)
4 NaN NaN NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14 02:00:00 NaN
In a real scenario I have a case that I don't have values in datetime1 and datetime2, or I have values in datatime1 but I don't in datatime2, so is there is a possible way to get NaN in "difference" column if there is no timestamp in datetime1 and 2, and if there is a timestamp only in datetime1, get the difference compared to datetime.now() and put that in another column.
Upvotes: 2
Views: 219
Reputation: 195418
Try:
def strfdelta(tdelta, fmt):
d = {"days": tdelta.days}
d["hours"], rem = divmod(tdelta.seconds, 3600)
d["minutes"], d["seconds"] = divmod(rem, 60)
return fmt.format(**d)
# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])
df["Difference in H:m:s"] = df.apply(
lambda x: strfdelta(
x["datetime2"] - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
),
axis=1,
)
print(df)
Prints:
datetime1 datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00
EDIT: To handle NaN
s:
# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])
df["Difference in H:m:s"] = df.apply(
lambda x: strfdelta(
x["datetime2"] - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
)
if pd.notna(x["datetime1"]) and pd.notna(x["datetime2"])
else np.nan,
axis=1,
)
df["Compared with datetime.now()"] = df.apply(
lambda x: strfdelta(
pd.Timestamp.now() - x["datetime1"],
"{hours:02d}:{minutes:02d}:{seconds:02d}",
)
if pd.notna(x["datetime1"]) & pd.isna(x["datetime2"])
else np.nan,
axis=1,
)
print(df)
Prints:
datetime1 datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14 01:00:00 NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14 02:00:00 NaN
2 NaT NaT NaN NaN
3 2021-05-09 16:30:14 NaT NaN 03:00:20
4 NaT NaT NaN NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14 02:00:00 NaN
Upvotes: 1