Timetraveller
Timetraveller

Reputation: 314

Convert content of Object datatype to Date datatype in Python

I am using Jupyter Notebook, Pandas framework and Python as the programming language. I have a dataframe which is of the following shape (10500, 4). So it has 4 columns and 10500 records. Initial_Date is one out of the 4 columns which is an Object datatype. This is the type of information it contains:

Initial_Date
1971
11969
102006
03051992
00131954
27001973
45061987
1996

It is easy to make out the format of the column as DDMMYYYY (03051992 is 3rd May 1992)

Note: As you can see there are invalid MM (00 and 13) and invalid DD (00 and 45).

I would like to use regex to extract whatever is available in the field. I don't know how to read YYYY separately to MM or DD so please enlighten me here. After the extraction occurs, I would like to test whether the YYYY, DD and MM are valid. If either of them are not valid then assign NaT else DD-MM-YYYY or DD/MM/YYYY (not fussy with the end format).

For example: 051992 is considered as invalid since this becomes DD/05/1992

A field that has full 8 digits for example 10081996 is considered valid 10/08/1996

PS. I am starting out with Pandas, Jupyter notebook and slowing reviving my Python skills. FYI If you guys think there is a better way to convert each field to a valid Date datatype then please do enlighten me.

Upvotes: 1

Views: 880

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

you can do it this way:

result = pd.to_datetime(d.Initial_Date.astype(str), dayfirst=True, errors='coerce')

result.ix[result.isnull()] = pd.to_datetime(d.Initial_Date.astype(str), format='%d%m%Y', dayfirst=True, errors='coerce')
#format is set to %d%m%Y

result:

In [88]: result
Out[88]:
0   1971-01-01
1          NaT
2   2006-10-20
3   1992-03-05
4   1954-01-03
5          NaT
6          NaT
7   1996-01-01
Name: Initial_Date, dtype: datetime64[ns]

original DF

In [89]: d
Out[89]:
   Initial_Date
0          1971
1         11969
2        102006
3       3051992
4        131954
5      27001973
6      45061987
7          1996

Upvotes: 1

Related Questions