Joehat
Joehat

Reputation: 1129

Python: Match same date format in two date time columns

I have the following dataframe with two dates where one of them has the timezone included.

df = pd.DataFrame(np.array([[10, "2021-06-13 12:08:52.311 UTC", "2021-03-29 12:44:33.468"], 
                            [36, "2019-12-07 12:18:02.311 UTC", "2011-10-15 10:14:32.118"]
                           ]),
                   columns=['col1', 'date1', 'date2'])
df

Here's how I am converting them from a string to datetime:

df["date1"]= pd.to_datetime(df["date1"])
df["date2"]= pd.to_datetime(df["date2"])

which returns:

   col1  date1                              date2
0   10  2021-06-13 12:08:52.311000+00:00    2021-03-29 12:44:33.468
1   36  2019-12-07 12:18:02.311000+00:00    2011-10-15 10:14:32.118

At some point, I need to compare these two dates to look for the same values. For this, I need them to be written in the same format with the same number of digits. This said, how could I remove the time zone from date1 so it matches the same format in date2?

utc=None as default in pd.to_datetime, so that didn't work...

I am assuming both dates are in UTC. In the original data, these are part of different datasets, that's why they have a different format.

Upvotes: 1

Views: 625

Answers (1)

not_speshal
not_speshal

Reputation: 23146

You can do one of two things:

df["date1"]= pd.to_datetime(df["date1"], format="%Y-%m-%d %H:%M:%S.%f UTC")
df["date2"]= pd.to_datetime(df["date2"], format="%Y-%m-%d %H:%M:%S.%f")
>>> df
  col1                   date1                   date2
0   10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1   36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118

Or:

df["date1"]= pd.to_datetime(df["date1"].str.replace(" UTC", ""))
df["date2"]= pd.to_datetime(df["date2"])
>>> df
 col1                   date1                   date2
0   10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1   36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118

Upvotes: 1

Related Questions