Reputation: 346
So I have a issue around dates that are coming from a excel sheet which I'm transforming into a CSV and then loading into a data frame. Basically the data I'm dealing with each day can come in two different formats. These two date columns are called Appointment Date
and Attended Date
I'm dealing with (DD/MM/YYYY HH:MM) and (YYYY/MM/DD HH:MM) and its coming from a third party so I cant set the date format structure. What i need to do is parse the data and remove the HH:MM and output the data only has DD/MM/YYYY.
My current code is currently the following:
df['Appointment Date'] = df['Appointment Date'].str.replace(' ', '/', regex=True)
df['Attended Date'] = df['Attended Date'].str.replace(' ', '/', regex=True)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
df['Attended Date'] = pd.to_datetime(df['Attended Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
But I'm not able to parse the data when it comes through as YYYY/MM/DD HH:MM Exception error:
time data '2021-10-08/00:00:00' does not match format '%d/%m/%Y/%H:%M' (match)
Any ideas on how i can get around this?
Upvotes: 0
Views: 2366
Reputation: 12558
Try it one way, and if it doesn't work, try it the other way.
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")
Of course, instead of WhateverDateParseException
use the actual exception that is raised in your code.
Edit: fixed missing "%S"
Upvotes: 2
Reputation: 79620
The format, %d/%m/%Y/%H:%M
does not match with the Date-Time string, 2021-10-08/00:00:00
. You need to use %Y-%m-%d/%H:%M:%S
for this Date-Time string.
Demo:
from datetime import datetime
date_time_str = '2021-10-08/00:00:00'
date_str = datetime.strptime(date_time_str, '%Y-%m-%d/%H:%M:%S').strftime('%d/%m/%Y')
print(date_str)
Output:
08/10/2021
Upvotes: 2
Reputation: 103
As mentioned by @C14L that method can be followed but my guess seeing your exception is you need to add a seconds format (%S) to your time formatting, so the updated code wld be like
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")
Upvotes: 2
Reputation: 36873
I would use regular expressions for that as follows:
import pandas as pd
df = pd.DataFrame({"daytime": ["31/12/2020 23:59", "2020/12/31 23:59"]})
df["daypart"] = df["daytime"].str.replace(r" \d\d:\d\d","") # drop HH:MM part
df["day"] = df["daypart"].str.replace(r"(\d\d\d\d)/(\d\d)/(\d\d)", r"\3/\2/\1")
print(df)
output
daytime daypart day
0 31/12/2020 23:59 31/12/2020 31/12/2020
1 2020/12/31 23:59 2020/12/31 31/12/2020
Explanation: I used so-called capturing groups in second .replace
, if there is (4 digits)/(2 digits)/(2 digits) their order is re-arranged that 3rd become 1st, 2nd become 2nd and 1st become 3rd (note that group are 1-based, not 0-base like is case with general python
indexing). AS day
format is now consistent you could be able to parse it easily.
Upvotes: 2