Reputation: 637
I have a dataframe that contains a column which holds:
Date:
31MAR2005
30-06-05
311205
I would like to convert these dates to the format : 30-06-05 (DD-MM-JJ). What is the simplest way to do this? The fields are not in a date format yet, only strings.
Upvotes: 3
Views: 4201
Reputation: 879103
You could use Pandas' vectorize string methods to extract the day, month and year from each date string:
import pandas as pd
df = pd.DataFrame(['31MAR2005', '30-06-05', '311205'], columns=['Date'])
tmp = df['Date'].str.extract(r'(\d{2})-?(\D{3}|\d{2})-?.*(\d{2})')
tmp.columns = ['day', 'month', 'year']
yields
In [228]: tmp
Out[228]:
day month year
0 31 MAR 05
1 30 06 05
2 31 12 05
Now you can change 3-letter month abbreviations to numeric strings by calling Series.map
:
import calendar
monthmap = {calendar.month_abbr[i].upper(): '{:02d}'.format(i) for i in range(1, 13)}
monthmap.update({'{:02d}'.format(i):'{:02d}'.format(i) for i in range(1, 13)})
tmp['month'] = tmp['month'].map(monthmap)
yields
In [230]: tmp
Out[230]:
day month year
0 31 03 05
1 30 06 05
2 31 12 05
And finally, you can re-assign df['Date']
to the desired date-string format:
df['Date'] = tmp['day']+'-'+tmp['month']+'-'+tmp['year']
yields
In [232]: df
Out[232]:
Date
0 31-03-05
1 30-06-05
2 31-12-05
Especially if the DataFrame is long,
using vectorized string methods should be faster than using df.apply
to call a Python function once for every row value.
Also note that this accomplishes the task without parsing the strings as
Timestamps. That might be a good or a bad thing.
On the one hand, it may improve performance. On
the other hand, it may allow invalid date strings (such as '30FEB2005'
) to slip through.
After re-formatting the strings, you could call
df['Date'] = pd.to_datetime(df['Date'])
to convert the date strings into proper Timestamps
. Then invalid date strings would become NaT
(Not-a-Timestamp) values.
Upvotes: 2
Reputation: 826
Here is my example :
def string_to_date(my_string):
if '-' in my_string:
return datetime.datetime.strptime(my_string, '%d-%m-%y')
elif my_string.isdigit():
return datetime.datetime.strptime(my_string, '%d%m%y')
elif my_string.isalnum():
return datetime.datetime.strptime(my_string, '%d%b%Y')
now I'm testing it on your dataframe df :
In[116]: df['Date:'].apply(lambda x: string_to_date(x))
Out[114]:
0 2005-03-31
1 2005-06-30
2 2005-12-31
Name: Date:, dtype: datetime64[ns]
Upvotes: 3