Reputation: 7458
I have the following df
,
inv_date
24/01/2008
nan
nan
nan
13/08/2007
02/04/2007
02/04/2007
03/04/2007
03/04/2007
03/04/2007
04/04/2007
09/08/2007 12:16:55
The values in inv_date
are all strings, I tried to use some code to convert them into datetime64
; format
in inferred from inv_date
as the most common date format, and the reason I don't slice str[:10]
the inv_date
, is because the majority of values are not always just day/month/year, sometimes it can also include hour/min/sec, so slice the values to a fixed position is not ideal;
failed_rows = pd.isnull(pd.to_datetime(data.df['inv_date'], errors='coerce', format='%d/%m/%Y'))
if failed_rows.sum():
df.loc[failed_rows, 'inv_date'] = pd.to_datetime(df.loc[failed_rows, 'inv_date'], errors='coerce').dt.floor('D')
df.loc[~failed_rows, 'inv_date'] = pd.to_datetime(df.loc[~failed_rows, 'inv_date'], errors='coerce', format='%d/%m/%Y')
it turns out to be
inv_date
1201132800000000000
None
None
None
1186963200000000000
1175472000000000000
1175472000000000000
1175558400000000000
1175558400000000000
1175558400000000000
1175644800000000000
1189209600000000000
The ideal result should look like,
inv_date
24/01/2008
NaT
NaT
NaT
13/08/2007
02/04/2007
02/04/2007
03/04/2007
03/04/2007
03/04/2007
04/04/2007
09/08/2007
with dtype
datetime64
.
Upvotes: 0
Views: 85
Reputation: 402872
Just convert to datetime, normalize, and convert back to string. NaTs are retained.
(pd.to_datetime(df['inv_date'], errors='coerce')
.dt.normalize()
.dt.strftime('%d/%m/%Y'))
0 24/01/2008
1 NaT
2 NaT
3 NaT
4 13/08/2007
5 04/02/2007
6 04/02/2007
7 04/03/2007
8 04/03/2007
9 04/03/2007
10 04/04/2007
11 08/09/2007
Name: inv_date, dtype: object
Upvotes: 1