Reputation: 331
I need to analyse a large dataset with dates formatted in several different formats:
Mon, 04 Nov 2019 06:12:44 -0800 (PST)
Mon, 4 Nov 2019 15:16:58 +0100 (CET)
Mon, 4 Nov 2019 08:03:13 +0000 (UTC)
Mon, 4 Nov 2019 12:05:54 +0100
dfMail.Date = pd.to_datetime(dfMail.Date, format = "%a, %d %b %Y %H:%M:%S %z")
returns error: ValueError: unconverted data remains: (PST)
What is the best strategy to convert these dates?
Thanks
Upvotes: 1
Views: 510
Reputation: 150735
I see that the ()
extension might be troublesome. In which case, you can just ignore it:
pd.to_datetime(dfMail.Date.str.replace('( \(.*\))', ''), utc=True)
Input:
Date
0 Mon, 04 Nov 2019 06:12:44 -0800 (PST)
1 Mon, 4 Nov 2019 15:16:58 +0100 (CET)
2 Mon, 4 Nov 2019 08:03:13 +0000 (UTC)
3 Mon, 4 Nov 2019 12:05:54 +0100
4 Thu, 17 Oct 2019 23:19:41 +0100 (GMT+01:00)
Output:
0 2019-11-04 14:12:44+00:00
1 2019-11-04 14:16:58+00:00
2 2019-11-04 08:03:13+00:00
3 2019-11-04 11:05:54+00:00
4 2019-10-17 22:19:41+00:00
Name: 0, dtype: datetime64[ns, UTC]
Upvotes: 1