Chris T.
Chris T.

Reputation: 1801

Converting different date time formats to MM/DD/YYYY format in pandas dataframe

I have a date column in a pandas.DataFrame in various date time formats and stored as list object, like the following:

            date
1    [May 23rd, 2011]
2    [January 1st, 2010]
    ...
99   [Apr. 15, 2008]
100  [07-11-2013]
    ...
256  [9/01/1995]
257  [04/15/2000]
258  [11/22/68]
    ...
360  [12/1997]
361  [08/2002]
     ...
463  [2014]
464  [2016]

For the sake of convenience, I want to convert them all to MM/DD/YYYY format. It doesn't seem possible to use regex replace() function to do this, since one cannot execute this operation over list objects. Also, to use strptime() for each cell will be too time-consuming.

What will be the easier way to convert them all to the desired MM/DD/YYYY format? I found it very hard to do this on list objects within a dataframe.

Note: for cell values of the form [YYYY] (e.g., [2014] and [2016]), I will assume they are the first day of that year (i.e., January 1, 1968) and for cell values such as [08/2002] (or [8/2002]), I will assume they the first day of the month of that year (i.e., August 1, 2002).

Upvotes: 3

Views: 15902

Answers (3)

Sarender Reddy
Sarender Reddy

Reputation: 79

Provided code will work for following scenarios.

  • Change date format from M/D/YY to MM/DD/YY (5/2/2009 to 05/02/2009)
  • change form ANY FORMAT to MM/DD/YY

import pandas as pd

'''
       * checking provided input file date format correct or not
       * if format is correct change date format from M/D/YY to MM/DD/YY
       * else date format is not correct in input file
         Date format  change form ANY FORMAT to MM/DD/YY
  '''
input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/predictions.csv'
dest_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/Enrich.csv'
#input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/enrichment.csv'
read_data = pd.read_csv(input_file_name)
print(pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all())

if pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all():
    print("Provided correct input date format in input file....!")
    read_data['Date'] = pd.to_datetime(read_data['Date'],format='%m/%d/%Y')
    read_data['Date'] = read_data['Date'].dt.strftime('%m/%d/%Y')
    read_data.to_csv(dest_file_name,index=False)
    print(read_data['Date'])
else:
    print("NOT... Provided correct input date format in input file....!")
    data_format = pd.read_csv(input_file_name,parse_dates=['Date'], dayfirst=True)
    #print(df['Date'])
    data_format['Date'] = pd.to_datetime(data_format['Date'],format='%m/%d/%Y')
    data_format['Date'] = data_format['Date'].dt.strftime('%m/%d/%Y')
    data_format.to_csv(dest_file_name,index=False)
    print(data_format['Date'])

Upvotes: 0

Ashu007
Ashu007

Reputation: 795

It would be better if you use this it'll give you the date format in MM-DD-YYYY the you can apply strftime:

df['Date_ColumnName'] = pd.to_datetime(df['Date_ColumnName'], dayfirst = False, yearfirst = False)

Upvotes: 2

Stephen Rauch
Stephen Rauch

Reputation: 49814

Given your sample data, with the addition of a NaT, this works:

Code:

df.date.apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])

Test Code:

import pandas as pd

df = pd.DataFrame([
    [['']],
    [['May 23rd, 2011']],
    [['January 1st, 2010']],
    [['Apr. 15, 2008']],
    [['07-11-2013']],
    [['9/01/1995']],
    [['04/15/2000']],
    [['11/22/68']],
    [['12/1997']],
    [['08/2002']],
    [['2014']],
    [['2016']],
], columns=['date'])

df['clean_date'] = df.date.apply(
    lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])

print(df)

Results:

                   date  clean_date
0                    []         NaT
1      [May 23rd, 2011]  05/23/2011
2   [January 1st, 2010]  01/01/2010
3       [Apr. 15, 2008]  04/15/2008
4          [07-11-2013]  07/11/2013
5           [9/01/1995]  09/01/1995
6          [04/15/2000]  04/15/2000
7            [11/22/68]  11/22/1968
8             [12/1997]  12/01/1997
9             [08/2002]  08/01/2002
10               [2014]  01/01/2014
11               [2016]  01/01/2016

Upvotes: 9

Related Questions