Reputation: 45
I wrote a simple function to convert a provided character date/ datetime to a numeric date. I was expecting the function to convert a value to either a date or datetime based on the length of the character string.
The function and the code calling the function I've used are below:
def type_convert(var):
if len(var) == 10:
return pd.to_datetime(var, format='%Y-%m-%d').date()
elif len(var) == 16:
return pd.to_datetime(var, format='%Y-%m-%dT%H:%M')
elif len(var) == 19:
return pd.to_datetime(var, format='%Y-%m-%dT%H:%M:%S')
df_test = pd.DataFrame({'a':['2017-12-13T23:01', '2016-11-15T18:00:00', '2018-04-09']})
print(df_test['a'].apply(type_convert))
I was expecting the result to be:
0 2017-12-13 23:01:00
1 2016-11-15 18:00:00
2 2018-04-09
i.e. I was expecting that the date only value would not be returned as a datetime. What I actually got was:
0 2017-12-13 23:01:00
1 2016-11-15 18:00:00
2 2018-04-09 00:00:00
I've tried writing test code to return multiple data types from a function and that works fine so I'm guessing this is more to do with how Python handles dates and datetime values. Any help understanding what I'm missing would be appreciated. Thanks!
Upvotes: 2
Views: 403
Reputation: 2349
Huh. Well I found the answer - for some reason, wrapping the df_test['a'].apply(type_convert)
inside of a print()
statement gives a different result to performing the apply function and then printing the result separately. You can see the difference for yourself if you do:
import pandas as pd
def type_convert(var):
if len(var) == 10:
return pd.to_datetime(var, format='%Y-%m-%d').date()
elif len(var) == 16:
return pd.to_datetime(var, format='%Y-%m-%dT%H:%M')
elif len(var) == 19:
return pd.to_datetime(var, format='%Y-%m-%dT%H:%M:%S')
df_test = pd.DataFrame({'a':['2017-12-13T23:01', '2016-11-15T18:00:00', '2018-04-09']})
print(df_test['a'].apply(type_convert))
#### This will give you the original result
df_test = pd.DataFrame({'a':['2017-12-13T23:01', '2016-11-15T18:00:00', '2018-04-09']})
df_test['a'].apply(type_convert)
print(df_test)
#### This will give you the desired result
Question to follow-up: why is this the case? What is print doing differently from the in-place modification?
Upvotes: 1