Reputation: 75
I am trying to convert different date in string format to datetime format. I have about ~ 1,000,000 rows of dates in column 'status_change_date'. The issue is that there are many different format and I do not know in what format they all are.
So, I am trying this function on my jupyter note:
def parsing_date(date_string):
for date_format in ("%d/%m/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S%p", "%d/%m/%Y %H:%M:%S%p", "%-m/%-d/%Y %H:%M:%S", "%-d/%-m/%Y %H:%M:%S", "%-m/%-d/%Y %H:%M:%S%p", "%-d/%-m/%Y %H:%M:%S%p"):
try:
return datetime.strptime(date_string, date_format)
except ValueError:
raise
print(date_string)
raise ValueError('Something is wrong')
then,
data['status_chage_date'].apply(parsing_date)
I am printing out the error so I can take account one by one in my function till there is no error anymore. However, I am encountering this error message:
ValueError: time data '17/10/2019 05:49:51' does not match format '%m/%d/%Y %H:%M:%S'
or
strptime() argument 1 must be str, not None
I thought I covered the format in my function and the second error is strange as I excluded rows that has None value.
What am I doing wrong and is there a better way to do this?
Upvotes: 2
Views: 92
Reputation: 2452
note that you raise when an error occurs! You need to test all your possibilities before:
import pandas as pd
def parsing_date(date_string):
d = None
for date_format in ("%d/%m/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S%p", "%d/%m/%Y %H:%M:%S%p", "%-m/%-d/%Y %H:%M:%S", "%-d/%-m/%Y %H:%M:%S", "%-m/%-d/%Y %H:%M:%S%p", "%-d/%-m/%Y %H:%M:%S%p"):
try:
d = datetime.strptime(date_string, date_format)
break
except:
pass
if d is not None:
return d
else:
return pd.NaT
Also notice that if your day and month are both between the numbers 1 to 12,you will not be sure that your parsing to the correct datetime. If possible, you should seek what causes the different formats and parse them individually.
Upvotes: 5