Mizanur Choudhury
Mizanur Choudhury

Reputation: 346

Dealing with different date formats in python

So I have a issue around dates that are coming from a excel sheet which I'm transforming into a CSV and then loading into a data frame. Basically the data I'm dealing with each day can come in two different formats. These two date columns are called Appointment Date and Attended Date

I'm dealing with (DD/MM/YYYY HH:MM) and (YYYY/MM/DD HH:MM) and its coming from a third party so I cant set the date format structure. What i need to do is parse the data and remove the HH:MM and output the data only has DD/MM/YYYY.

My current code is currently the following:

df['Appointment Date'] = df['Appointment Date'].str.replace(' ', '/', regex=True)
df['Attended Date'] = df['Attended Date'].str.replace(' ', '/', regex=True)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
df['Attended Date'] = pd.to_datetime(df['Attended Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")

But I'm not able to parse the data when it comes through as YYYY/MM/DD HH:MM Exception error:

time data '2021-10-08/00:00:00' does not match format '%d/%m/%Y/%H:%M' (match)

Any ideas on how i can get around this?

Upvotes: 0

Views: 2366

Answers (4)

C14L
C14L

Reputation: 12558

Try it one way, and if it doesn't work, try it the other way.

try:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")

Of course, instead of WhateverDateParseException use the actual exception that is raised in your code.

Edit: fixed missing "%S"

Upvotes: 2

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79620

The format, %d/%m/%Y/%H:%M does not match with the Date-Time string, 2021-10-08/00:00:00. You need to use %Y-%m-%d/%H:%M:%S for this Date-Time string.

Demo:

from datetime import datetime

date_time_str = '2021-10-08/00:00:00'
date_str = datetime.strptime(date_time_str, '%Y-%m-%d/%H:%M:%S').strftime('%d/%m/%Y')
print(date_str)

Output:

08/10/2021

Upvotes: 2

Justin J AR
Justin J AR

Reputation: 103

As mentioned by @C14L that method can be followed but my guess seeing your exception is you need to add a seconds format (%S) to your time formatting, so the updated code wld be like

try:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
    df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")

Upvotes: 2

Daweo
Daweo

Reputation: 36873

I would use regular expressions for that as follows:

import pandas as pd
df = pd.DataFrame({"daytime": ["31/12/2020 23:59", "2020/12/31 23:59"]})
df["daypart"] = df["daytime"].str.replace(r" \d\d:\d\d","")  # drop HH:MM part
df["day"] = df["daypart"].str.replace(r"(\d\d\d\d)/(\d\d)/(\d\d)", r"\3/\2/\1")
print(df)

output

            daytime     daypart         day
0  31/12/2020 23:59  31/12/2020  31/12/2020
1  2020/12/31 23:59  2020/12/31  31/12/2020

Explanation: I used so-called capturing groups in second .replace, if there is (4 digits)/(2 digits)/(2 digits) their order is re-arranged that 3rd become 1st, 2nd become 2nd and 1st become 3rd (note that group are 1-based, not 0-base like is case with general python indexing). AS day format is now consistent you could be able to parse it easily.

Upvotes: 2

Related Questions