davo777
davo777

Reputation: 336

Manipulating datetime formats in pandas for Athena

Due to the way Pandas and Athena read datetime formats, I've had to manipulate several columns in order to get the data in a format that works.

Which involves converting a string to a datetime

df["serviceStartTimestamp"] = pd.to_datetime(df["serviceStartTimestamp"])

returning

0   2019-08-07 08:04:43.942000+00:00
1   2019-08-07 08:13:48.397000+00:00
2   2019-08-07 09:02:29.067000+00:00

then running this,

df["serviceStartTimestamp"] = pd.to_datetime(df["serviceStartTimestamp"].astype("str").str.split("+", expand=True)[0])

returning

0   2019-08-07 08:04:43.942
1   2019-08-07 08:13:48.397
2   2019-08-07 09:02:29.067

If the datetime columns include any further digits they get completed messed up in athena, but how I'm currently solving it seems inefficient, so I was wondering if there was a better way?

Upvotes: 1

Views: 783

Answers (1)

harpan
harpan

Reputation: 8631

You need to pass format='%Y-%m-%d %H:%M:%S.%f'.

pd.to_datetime(df["serviceStartTimestamp"]).dt.strftime('%Y-%m-%d %H:%M:%S.%f')

Output:

0   2019-08-07 08:04:43.942
1   2019-08-07 08:13:48.397
2   2019-08-07 09:02:29.067

Upvotes: 1

Related Questions