bbennett36
bbennett36

Reputation: 6345

Pandas - "time data does not match format " error when the string does match the format?

I'm getting a value error saying my data does not match the format when it does. Not sure if this is a bug or I'm missing something here. I'm referring to this documentation for the string format. The weird part is if I write the 'data' Dataframe to a csv and read it in then call the function below it will convert the date so I'm not sure why it doesn't work without writing to a csv.

Any ideas?

data['Date'] = pd.to_datetime(data['Date'], format='%d-%b-%Y')  

I'm getting two errors

 TypeError: Unrecognized value type: <class 'str'>  

 ValueError: time data '27‑Aug‑2018' does not match format '%d-%b-%Y' (match) 

Example dates -

2‑Jul‑2018     
27‑Aug‑2018    
28‑May‑2018    
19‑Jun‑2017    
5‑Mar‑2018     
15‑Jan‑2018    
11‑Nov‑2013    
23‑Nov‑2015    
23‑Jun‑2014    
18‑Jun‑2018    
30‑Apr‑2018    
14‑May‑2018    
16‑Apr‑2018    
26‑Feb‑2018    
19‑Mar‑2018    
29‑Jun‑2015   

Is it because they all aren't double digit days? What is the string format value for single digit days? Looks like this could be the cause but I'm not sure why it would error on the '27' though.

End solution (It was unicode & not a string) -

data['Date'] = data['Date'].apply(unidecode.unidecode)  
data['Date'] = data['Date'].apply(lambda x: x.replace("-", "/"))    
data['Date'] = pd.to_datetime(data['Date'], format="%d/%b/%Y")

Upvotes: 1

Views: 11433

Answers (2)

BENY
BENY

Reputation: 323226

You got a special mark here it is not -

df.iloc[0,0][2]
Out[287]: '‑'

Replace it with '-'

pd.to_datetime(df.iloc[:,0].str.replace('‑','-'),format='%d-%b-%Y')
Out[288]: 
0    2018-08-27
1    2018-05-28
2    2017-06-19
3    2018-03-05
4    2018-01-15
5    2013-11-11
6    2015-11-23
7    2014-06-23
8    2018-06-18
9    2018-04-30
10   2018-05-14
11   2018-04-16
12   2018-02-26
13   2018-03-19
14   2015-06-29
Name: 2‑Jul‑2018, dtype: datetime64[ns]

Upvotes: 1

Yuca
Yuca

Reputation: 6091

There seems to be an issue with your date strings. I replicated your issue with your sample data and if I remove the hyphens and replace them manually (for the first three dates) then the code works

pd.to_datetime(df1['Date'] ,errors ='coerce')

output:

0    2018-07-02
1    2018-08-27
2    2018-05-28
3           NaT
4           NaT
5           NaT
6           NaT
7           NaT
8           NaT
9           NaT
10          NaT
11          NaT
12          NaT
13          NaT
14          NaT
15          NaT

Bottom line: your hyphens look like regular ones but are actually something else, just clean your source data and you're good to go

Upvotes: 2

Related Questions