changhons
changhons

Reputation: 75

Python: Changing multiple dates format in string to datetime format

I am trying to convert different date in string format to datetime format. I have about ~ 1,000,000 rows of dates in column 'status_change_date'. The issue is that there are many different format and I do not know in what format they all are.

So, I am trying this function on my jupyter note:

def parsing_date(date_string):
    for date_format in ("%d/%m/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S%p", "%d/%m/%Y %H:%M:%S%p", "%-m/%-d/%Y %H:%M:%S", "%-d/%-m/%Y %H:%M:%S", "%-m/%-d/%Y %H:%M:%S%p", "%-d/%-m/%Y %H:%M:%S%p"):
        try:
            return datetime.strptime(date_string, date_format)
        except ValueError:
            raise
            print(date_string)
        raise ValueError('Something is wrong')

then,

data['status_chage_date'].apply(parsing_date)

I am printing out the error so I can take account one by one in my function till there is no error anymore. However, I am encountering this error message:

ValueError: time data '17/10/2019 05:49:51' does not match format '%m/%d/%Y %H:%M:%S'

or

strptime() argument 1 must be str, not None

I thought I covered the format in my function and the second error is strange as I excluded rows that has None value.

What am I doing wrong and is there a better way to do this?

Upvotes: 2

Views: 92

Answers (1)

bcosta12
bcosta12

Reputation: 2452

note that you raise when an error occurs! You need to test all your possibilities before:

import pandas as pd 

def parsing_date(date_string):
    d = None
    for date_format in ("%d/%m/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S%p", "%d/%m/%Y %H:%M:%S%p", "%-m/%-d/%Y %H:%M:%S", "%-d/%-m/%Y %H:%M:%S", "%-m/%-d/%Y %H:%M:%S%p", "%-d/%-m/%Y %H:%M:%S%p"):
        try:
            d = datetime.strptime(date_string, date_format)
            break
        except:
            pass
    if d is not None:
        return d
    else:
        return pd.NaT

Also notice that if your day and month are both between the numbers 1 to 12,you will not be sure that your parsing to the correct datetime. If possible, you should seek what causes the different formats and parse them individually.

Upvotes: 5

Related Questions