user14073111
user14073111

Reputation: 621

Comparing two columns in a dataframe that contains date and time and give the difference in another column

I have a data frame like this:

         datetime1                datetime2             
0   2021-05-09 19:52:14      2021-05-09 20:52:14  
1   2021-05-09 19:52:14      2021-05-09 21:52:14  

I want to compare them and create a new column with the difference between them:

The desirable output is like below:

         datetime1                datetime2              Difference in H:m:s
0   2021-05-09 19:52:14      2021-05-09 20:52:14                  01:00:00
1   2021-05-09 19:52:14      2021-05-09 21:52:14                  02:00:00

EDIT:

@Andrej that solution that you gave me worked perfectly when I had timestamps in both datetime1 and2. If I have a df like below, its is failing because it has nothing to compare

df1:

         datetime1                datetime2             
0   2021-05-09 19:52:14      2021-05-09 20:52:14  
1   2021-05-09 19:52:14      2021-05-09 21:52:14 
2           NaN                      NaN
3  2021-05-09 16:30:14               NaN
4           NaN                      NaN
5  2021-05-09 12:30:14        2021-05-09 14:30:14

df2 (desirable output):

         datetime1            datetime2        Difference in H:m:s    Compared with datetime.now()
0   2021-05-09 19:52:14  2021-05-09 20:52:14         01:00:00           NaN
1   2021-05-09 19:52:14  2021-05-09 21:52:14         02:00:00           NaN
2           NaN               NaN                      NaN              NaN
3   2021-05-09 16:30:14       NaN                      NaN       e.g(04:00:00)
4           NaN               NaN                      NaN              NaN
5  2021-05-09 12:30:14   2021-05-09 14:30:14         02:00:00           NaN

In a real scenario I have a case that I don't have values in datetime1 and datetime2, or I have values in datatime1 but I don't in datatime2, so is there is a possible way to get NaN in "difference" column if there is no timestamp in datetime1 and 2, and if there is a timestamp only in datetime1, get the difference compared to datetime.now() and put that in another column.

Upvotes: 2

Views: 219

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

def strfdelta(tdelta, fmt):
    d = {"days": tdelta.days}
    d["hours"], rem = divmod(tdelta.seconds, 3600)
    d["minutes"], d["seconds"] = divmod(rem, 60)
    return fmt.format(**d)


# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])

df["Difference in H:m:s"] = df.apply(
    lambda x: strfdelta(
        x["datetime2"] - x["datetime1"],
        "{hours:02d}:{minutes:02d}:{seconds:02d}",
    ),
    axis=1,
)
print(df)

Prints:

            datetime1           datetime2 Difference in H:m:s
0 2021-05-09 19:52:14 2021-05-09 20:52:14            01:00:00
1 2021-05-09 19:52:14 2021-05-09 21:52:14            02:00:00

EDIT: To handle NaNs:

# if datetime1/datetime2 aren't already datetime, apply `.to_datetime()`:
df["datetime1"] = pd.to_datetime(df["datetime1"])
df["datetime2"] = pd.to_datetime(df["datetime2"])

df["Difference in H:m:s"] = df.apply(
    lambda x: strfdelta(
        x["datetime2"] - x["datetime1"],
        "{hours:02d}:{minutes:02d}:{seconds:02d}",
    )
    if pd.notna(x["datetime1"]) and pd.notna(x["datetime2"])
    else np.nan,
    axis=1,
)

df["Compared with datetime.now()"] = df.apply(
    lambda x: strfdelta(
        pd.Timestamp.now() - x["datetime1"],
        "{hours:02d}:{minutes:02d}:{seconds:02d}",
    )
    if pd.notna(x["datetime1"]) & pd.isna(x["datetime2"])
    else np.nan,
    axis=1,
)

print(df)

Prints:

            datetime1           datetime2 Difference in H:m:s Compared with datetime.now()
0 2021-05-09 19:52:14 2021-05-09 20:52:14            01:00:00                          NaN
1 2021-05-09 19:52:14 2021-05-09 21:52:14            02:00:00                          NaN
2                 NaT                 NaT                 NaN                          NaN
3 2021-05-09 16:30:14                 NaT                 NaN                     03:00:20
4                 NaT                 NaT                 NaN                          NaN
5 2021-05-09 12:30:14 2021-05-09 14:30:14            02:00:00                          NaN

Upvotes: 1

Related Questions