andrej
andrej

Reputation: 331

pd.to_datetime with different date formats

I need to analyse a large dataset with dates formatted in several different formats:

Mon, 04 Nov 2019 06:12:44 -0800 (PST)
Mon,  4 Nov 2019 15:16:58 +0100 (CET)
Mon,  4 Nov 2019 08:03:13 +0000 (UTC)
Mon,  4 Nov 2019 12:05:54 +0100

dfMail.Date = pd.to_datetime(dfMail.Date, format = "%a, %d %b %Y %H:%M:%S %z")

returns error: ValueError: unconverted data remains: (PST)

What is the best strategy to convert these dates?

Thanks

Upvotes: 1

Views: 510

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

I see that the () extension might be troublesome. In which case, you can just ignore it:

pd.to_datetime(dfMail.Date.str.replace('( \(.*\))', ''), utc=True)

Input:

                                          Date
0        Mon, 04 Nov 2019 06:12:44 -0800 (PST)
1        Mon,  4 Nov 2019 15:16:58 +0100 (CET)
2        Mon,  4 Nov 2019 08:03:13 +0000 (UTC)
3              Mon,  4 Nov 2019 12:05:54 +0100
4  Thu, 17 Oct 2019 23:19:41 +0100 (GMT+01:00)

Output:

0   2019-11-04 14:12:44+00:00
1   2019-11-04 14:16:58+00:00
2   2019-11-04 08:03:13+00:00
3   2019-11-04 11:05:54+00:00
4   2019-10-17 22:19:41+00:00
Name: 0, dtype: datetime64[ns, UTC]

Upvotes: 1

Related Questions