F1990
F1990

Reputation: 637

Pandas: Change dates in dataframe to same date format

I have a dataframe that contains a column which holds:

Date:
31MAR2005
30-06-05
311205

I would like to convert these dates to the format : 30-06-05 (DD-MM-JJ). What is the simplest way to do this? The fields are not in a date format yet, only strings.

Upvotes: 3

Views: 4201

Answers (2)

unutbu
unutbu

Reputation: 879103

You could use Pandas' vectorize string methods to extract the day, month and year from each date string:

import pandas as pd

df = pd.DataFrame(['31MAR2005', '30-06-05', '311205'], columns=['Date'])
tmp = df['Date'].str.extract(r'(\d{2})-?(\D{3}|\d{2})-?.*(\d{2})')
tmp.columns = ['day', 'month', 'year']

yields

In [228]: tmp
Out[228]: 
  day month year
0  31   MAR   05
1  30    06   05
2  31    12   05

Now you can change 3-letter month abbreviations to numeric strings by calling Series.map:

import calendar
monthmap = {calendar.month_abbr[i].upper(): '{:02d}'.format(i) for i in range(1, 13)}
monthmap.update({'{:02d}'.format(i):'{:02d}'.format(i) for i in range(1, 13)})
tmp['month'] = tmp['month'].map(monthmap)

yields

In [230]: tmp
Out[230]: 
  day month year
0  31    03   05
1  30    06   05
2  31    12   05

And finally, you can re-assign df['Date'] to the desired date-string format:

df['Date'] = tmp['day']+'-'+tmp['month']+'-'+tmp['year']

yields

In [232]: df
Out[232]: 
       Date
0  31-03-05
1  30-06-05
2  31-12-05

Especially if the DataFrame is long, using vectorized string methods should be faster than using df.apply to call a Python function once for every row value.

Also note that this accomplishes the task without parsing the strings as Timestamps. That might be a good or a bad thing. On the one hand, it may improve performance. On the other hand, it may allow invalid date strings (such as '30FEB2005') to slip through.

After re-formatting the strings, you could call

df['Date'] = pd.to_datetime(df['Date'])

to convert the date strings into proper Timestamps. Then invalid date strings would become NaT (Not-a-Timestamp) values.

Upvotes: 2

Alex
Alex

Reputation: 826

Here is my example :

def string_to_date(my_string):
    if '-' in my_string:
        return datetime.datetime.strptime(my_string, '%d-%m-%y')
    elif my_string.isdigit():
        return datetime.datetime.strptime(my_string, '%d%m%y')
    elif my_string.isalnum():
        return datetime.datetime.strptime(my_string, '%d%b%Y')

now I'm testing it on your dataframe df :

In[116]: df['Date:'].apply(lambda x: string_to_date(x))
Out[114]: 
0   2005-03-31
1   2005-06-30
2   2005-12-31
Name: Date:, dtype: datetime64[ns]

Upvotes: 3

Related Questions