Zhubarb
Zhubarb

Reputation: 11905

Pandas version 0.12 to version 0.13 to_datetime incompatibility

I have a Pandas version 0.12 data frame. I am trying to convert the months within a string series to textual format, e.g. 04 = April, 05 = May. I ended up having to work with two different versions (12 vs 13) of Pandas, which seem to have substantial interface changes.

df['date']
0    15/04/2013
1    09/02/2015
2    05/01/2015
3    26/01/2015
4    26/01/2015
Name: date, dtype: object

type(df['date'][0])
<type 'str'> 

The below code works with Pandas version 0.13, and converts, e.g. 15/02/2015 to 15 February 2015, for each entry in the series.

df.date = pd.to_datetime(df['date'], format="%d/%m/%Y").apply( lambda x:  x.date().strftime('%d %B %Y') ) 

But it throws an error with version 0.12:

File "/.../pandas/tseries/tools.py", line 124, in to_datetime values = _convert_listlike(arg.values, box=False) File "/.../pandas/tseries/tools.py", line 103, in _convert_listlike result = tslib.array_strptime(arg, format) File "tslib.pyx", line 1112, in pandas.tslib.array_strptime (pandas/tslib.c:18277) 
TypeError: expected string or buffer

I just need to get this done with, any ideas on what the Pandas 12 vesion of the above code would be? I do not have to use to_datetime() either, so any alternative solution suggestions are very welcome!

EDIT:

I tried this line upon @EdChum's recommendation:

df['date'] = df['date'].apply(lambda x: dt.datetime.strptime(x,'%d/%m/%Y')).apply( lambda x: x.date().strftime('%d %B %Y') )

But it gives the error:

File "/.../pandas/core/series.py", line 2536, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "inference.pyx", line 864, in pandas.lib.map_infer (pandas/lib.c:42840) File " in <lambda> df['date'] = df['date'].apply( lambda x: dt.datetime.strptime(x,'%d/%m/%Y')).apply( lambda x: x.date().strftime('%d %B %Y') ) TypeError: must be string, not float

I think this answers @joris's comment as well, it seems the issue is with the .apply() part. I do not understand how/where a float is created in this line of code...

Upvotes: 0

Views: 165

Answers (1)

joris
joris

Reputation: 139222

Probably it is due to some missing values. If you use dropna before using to_datetime and apply(... strftime()), this will work. A small example:

In [19]: df
Out[19]:
         date
0  15/04/2013
1  09/02/2015
2         NaN

In [21]: df['date2'] = df.date = pd.to_datetime(df['date'].dropna(), format="%d/
%m/%Y").apply( lambda x:  x.date().strftime('%d %B %Y') )

In [22]: df
Out[22]:
               date             date2
0     15 April 2013     15 April 2013
1  09 February 2015  09 February 2015
2               NaN               NaN

The reason for the difference between pandas 0.12 and 0.13 is that in 0.12 to_datetime could not yet handle this, and starting from 0.13 it does.

Upvotes: 1

Related Questions