Levo
Levo

Reputation: 63

How do I remove time from Pandas df column when it isn't in the excel column?

I have read in two worksheets separately into pandas dataframes. Both have date issued columns.

In both worksheets the column is held in excel as dd/mm/yyyy.

ws1

13/02/2017
01/02/2017
08/11/2016
05/08/2016
16/03/2017

53   2017-02-13
51   2017-02-01
22   2016-11-08
0    2016-08-05
63   2017-03-16
Name: Date Issued, dtype: datetime64[ns]

But ws2

08/03/2017
24/08/2016
28/11/2016
26/10/2016
10/03/2017


0    2017-03-08 00:00:00
1    2016-08-24 00:00:00
2    2016-11-28 00:00:00
3    2016-10-26 00:00:00
4    2017-03-10 00:00:00
Name: Date Issued, dtype: object

Why is it that the dtypes are different and how can I apply remove the time?

The code currently looks like nothing out the ordinary for dealing with pandas

df = pd.read_excel(file, 'ws2')
df = df.loc[:, ['Date Issued', 'Person ID',
                          'First Name', 'Surname', 'Type',  'Amount']]
df = df.sort_values(by=['Surname'])
df['Date Issued'] = pd.to_datetime(df_loan['Date Issued'], dayfirst=True)

I have tried to use;

df['Date Issued'] = pd.to_datetime(df['Date Issued'], dayfirst=True)

But get the following error;

TypeError: invalid string coercion to datetime

Also;

df['Date Issued'] = df['Date Issued'].astype('datetime64[ns]')

But getting this error;

ValueError: Error parsing datetime string " " at position 1

Upvotes: 1

Views: 500

Answers (1)

jezrael
jezrael

Reputation: 862801

It seems there is at least one non datetime value.

So need parameter errors='coerce' for convert these values to NaT (NaN for datetimes) in to_datetime:

df['Date Issued'] = pd.to_datetime(df['Date Issued'], dayfirst=True, errors='coerce')

Upvotes: 1

Related Questions