Reputation: 1801
I have a date column in a pandas.DataFrame
in various date time formats and stored as list object, like the following:
date
1 [May 23rd, 2011]
2 [January 1st, 2010]
...
99 [Apr. 15, 2008]
100 [07-11-2013]
...
256 [9/01/1995]
257 [04/15/2000]
258 [11/22/68]
...
360 [12/1997]
361 [08/2002]
...
463 [2014]
464 [2016]
For the sake of convenience, I want to convert them all to MM/DD/YYYY
format. It doesn't seem possible to use regex replace() function to do this, since one cannot execute this operation over list objects. Also, to use strptime() for each cell will be too time-consuming.
What will be the easier way to convert them all to the desired MM/DD/YYYY
format? I found it very hard to do this on list objects within a dataframe.
Note: for cell values of the form [YYYY]
(e.g., [2014]
and [2016]
), I will assume they are the first day of that year (i.e., January 1, 1968) and for cell values such as [08/2002]
(or [8/2002]
), I will assume they the first day of the month of that year (i.e., August 1, 2002).
Upvotes: 3
Views: 15902
Reputation: 79
Provided code will work for following scenarios.
import pandas as pd
'''
* checking provided input file date format correct or not
* if format is correct change date format from M/D/YY to MM/DD/YY
* else date format is not correct in input file
Date format change form ANY FORMAT to MM/DD/YY
'''
input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/predictions.csv'
dest_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/Enrich.csv'
#input_file_name = 'C:/Users/Admin/Desktop/SarenderReddy/enrichment.csv'
read_data = pd.read_csv(input_file_name)
print(pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all())
if pd.to_datetime(read_data['Date'], format='%m/%d/%Y', errors='coerce').notnull().all():
print("Provided correct input date format in input file....!")
read_data['Date'] = pd.to_datetime(read_data['Date'],format='%m/%d/%Y')
read_data['Date'] = read_data['Date'].dt.strftime('%m/%d/%Y')
read_data.to_csv(dest_file_name,index=False)
print(read_data['Date'])
else:
print("NOT... Provided correct input date format in input file....!")
data_format = pd.read_csv(input_file_name,parse_dates=['Date'], dayfirst=True)
#print(df['Date'])
data_format['Date'] = pd.to_datetime(data_format['Date'],format='%m/%d/%Y')
data_format['Date'] = data_format['Date'].dt.strftime('%m/%d/%Y')
data_format.to_csv(dest_file_name,index=False)
print(data_format['Date'])
Upvotes: 0
Reputation: 795
It would be better if you use this it'll give you the date format in MM-DD-YYYY the you can apply strftime:
df['Date_ColumnName'] = pd.to_datetime(df['Date_ColumnName'], dayfirst = False, yearfirst = False)
Upvotes: 2
Reputation: 49814
Given your sample data, with the addition of a NaT
, this works:
df.date.apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
import pandas as pd
df = pd.DataFrame([
[['']],
[['May 23rd, 2011']],
[['January 1st, 2010']],
[['Apr. 15, 2008']],
[['07-11-2013']],
[['9/01/1995']],
[['04/15/2000']],
[['11/22/68']],
[['12/1997']],
[['08/2002']],
[['2014']],
[['2016']],
], columns=['date'])
df['clean_date'] = df.date.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
print(df)
date clean_date
0 [] NaT
1 [May 23rd, 2011] 05/23/2011
2 [January 1st, 2010] 01/01/2010
3 [Apr. 15, 2008] 04/15/2008
4 [07-11-2013] 07/11/2013
5 [9/01/1995] 09/01/1995
6 [04/15/2000] 04/15/2000
7 [11/22/68] 11/22/1968
8 [12/1997] 12/01/1997
9 [08/2002] 08/01/2002
10 [2014] 01/01/2014
11 [2016] 01/01/2016
Upvotes: 9