Srpic
Srpic

Reputation: 450

Multi-format string column to be converted into date format

I have a column, which represents date, however is formatted as string. I can't use simple pandas.to_datetime like:

01/02/2023
Apr 02, 2016
Jun 2021
2023/12/01

I've tried to create a below formula, where I would list of potential date formats and using for loop to convert the column into desired format. However, it is obviously wrong as the column contains NONEs after applying it. Could you advise me better direction or what should I change, please?

def DateFormat(data):
    for fmt in ('%b %d, %Y', '%d/%m/%Y', '%b %Y', '%Y/%b/%d'):
        try:
            pd.to_datetime(data['date'], format=fmt)
            
        except ValueError:
            pass


data['date'] =  data.apply(DateFormat, axis = 1)

BEFORE APPLY DateFormat: | ID | Date | | --- | -------------- | | 1 | 01/02/2023 | | 2 | Apr 02, 2016 | | 3 | Jun 2021 | | 4 | 2023/12/01 |

AFTER APPLY DateFormat: | ID | Date | | --- | ----- | | 1 | None | | 2 | None | | 3 | None | | 4 | None |

Upvotes: 0

Views: 77

Answers (2)

furas
furas

Reputation: 143110

If you don't use return to return value then it uses return None at the end of function.

You should use return pd.to_datetime(...)

And if you want to return original value when it can't convert it then you need return at the end. OR you could use return to return some default value.

def DateFormat(data):
    for fmt in ('%b %d, %Y', '%d/%m/%Y', '%b %Y', '%Y/%b/%d'):
        try:
            return pd.to_datetime(data['date'], format=fmt)
        except ValueError:
            pass

    # return original `date` if it couldn't convert
    return data['date']

    # or return some default value
    #return datetime.datetime(1900, 1, 1)

EDIT:

Minimal working code.

I added also code from @Corralien answer and it also works for this data.

import pandas as pd

# --- functions ---
    
def parse_date(row):
    # I had to add `'%Y/%m/%d'`
    for fmt in ('%b %d, %Y', '%d/%m/%Y', '%b %Y', '%Y/%b/%d', '%Y/%m/%d'):
        try:
            return pd.to_datetime(row['date'], format=fmt)
        except ValueError:
            pass

# --- main ---

data = pd.DataFrame({
    'date': ['01/02/2023', 'Apr 02, 2016', 'Jun 2021', '2023/12/01']
})
        
data['new_date_1'] = data.apply(parse_date, axis=1)

data['new_date_2'] = pd.to_datetime(data['date'], dayfirst=True)

print(data)

Result:

           date new_date_1 new_date_2
0    01/02/2023 2023-02-01 2023-02-01
1  Apr 02, 2016 2016-04-02 2016-04-02
2      Jun 2021 2021-06-01 2021-06-01
3    2023/12/01 2023-12-01 2023-12-01

Upvotes: 1

Corralien
Corralien

Reputation: 120559

Why don't let Pandas infer your datetime format? and force dayfirst=True for your first date format (%d/%m/%Y)

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print(df)

# Output
   ID       Date
0   1 2023-02-01
1   2 2016-04-02
2   3 2021-06-01
3   4 2023-12-01

Upvotes: 1

Related Questions