Reputation: 1129
I have the following dataframe with two dates where one of them has the timezone included.
df = pd.DataFrame(np.array([[10, "2021-06-13 12:08:52.311 UTC", "2021-03-29 12:44:33.468"],
[36, "2019-12-07 12:18:02.311 UTC", "2011-10-15 10:14:32.118"]
]),
columns=['col1', 'date1', 'date2'])
df
Here's how I am converting them from a string to datetime:
df["date1"]= pd.to_datetime(df["date1"])
df["date2"]= pd.to_datetime(df["date2"])
which returns:
col1 date1 date2
0 10 2021-06-13 12:08:52.311000+00:00 2021-03-29 12:44:33.468
1 36 2019-12-07 12:18:02.311000+00:00 2011-10-15 10:14:32.118
At some point, I need to compare these two dates to look for the same values. For this, I need them to be written in the same format with the same number of digits. This said, how could I remove the time zone from date1 so it matches the same format in date2?
utc=None
as default in pd.to_datetime
, so that didn't work...
I am assuming both dates are in UTC. In the original data, these are part of different datasets, that's why they have a different format.
Upvotes: 1
Views: 625
Reputation: 23146
You can do one of two things:
df["date1"]= pd.to_datetime(df["date1"], format="%Y-%m-%d %H:%M:%S.%f UTC")
df["date2"]= pd.to_datetime(df["date2"], format="%Y-%m-%d %H:%M:%S.%f")
>>> df
col1 date1 date2
0 10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1 36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118
Or:
df["date1"]= pd.to_datetime(df["date1"].str.replace(" UTC", ""))
df["date2"]= pd.to_datetime(df["date2"])
>>> df
col1 date1 date2
0 10 2021-06-13 12:08:52.311 2021-03-29 12:44:33.468
1 36 2019-12-07 12:18:02.311 2011-10-15 10:14:32.118
Upvotes: 1