user7779326
user7779326

Reputation:

How to remove the error in pandas while converting timedelta variable?

I have a pandas code and work with lot of datafiles. I use the following code to convert time delta to date time index.

df['date_time'] = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]
df['date_time'] = pd.DatetimeIndex(df['date_time'])

But one particular data file gives me the error:

raise e
ValueError: Unknown string format

What could be the reason behind this error? If it is due to a invalid data in the datafile, how to remove it?

Upvotes: 1

Views: 569

Answers (1)

jezrael
jezrael

Reputation: 862511

I think you need parameter errors='coerce' for convert non datetime to NaT in to_datetime:

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce')

And then if need remove all rows with NaT use dropna:

df = df.dropna(subset=['date_time'])

Sample:

a = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00",
     "2016-05-24 12:50:00","2016-05-25 23:00:00","aaa"]
df = pd.DataFrame({'date_time':a})
print (df)
             date_time
0  2016-05-19 08:25:00
1  2016-05-19 16:00:00
2  2016-05-20 07:45:00
3  2016-05-24 12:50:00
4  2016-05-25 23:00:00
5                  aaa

df['date_time'] = pd.to_datetime(df['date_time'], errors='coerce')
print (df)
            date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
2 2016-05-20 07:45:00
3 2016-05-24 12:50:00
4 2016-05-25 23:00:00
5                 NaT

df = df.dropna(subset=['date_time'])
print (df)
            date_time
0 2016-05-19 08:25:00
1 2016-05-19 16:00:00
2 2016-05-20 07:45:00
3 2016-05-24 12:50:00
4 2016-05-25 23:00:00

Upvotes: 1

Related Questions