Monika Iyer
Monika Iyer

Reputation: 23

How to standardise different date formats in pandas?

I have a dataset in csv format which contains dates in a column. I have imported this dataset into python pandas, and this date column is shown as an object. I need to convert this column to date time but i have a problem. This date column has date format in two formats 1. 11/7/2013 11:51 2. 13-07-2013 08:33:16

I need to convert one format to another one in order to have a standard date format in my python to do analysis. How can i do this?

There are many rows of date in both these formats, so when i try to convert second format to first format using the below code

print(df['date'].apply(lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M')))

i get the below error

ValueError: time data '13-07-2013 08:33:16' does not match format '%d/%m/%Y %H:%M' (match)

so what would be the best method to standardise this column in one format?

Upvotes: 1

Views: 4804

Answers (4)

Cam
Cam

Reputation: 1731

In Pandas v1 to_datetime function is very robust and can handle most date formats. With your example dates it is as easy as calling to_datetime on your series.

d = ['11/7/2013 11:51', '13-07-2013 08:33:16']
df = pd.DataFrame({'dates': d})
df = pd.to_datetime(df['dates'])
df

output

0   2013-11-07 11:51:00
1   2013-07-13 08:33:16
Name: dates, dtype: datetime64[ns]

Just how robust is to_datetime?

lets tests it using a dataset of 25 different date styles from here.

http = r'https://www.ibm.com/docs/en/cmofz/10.1.0?topic=SSQHWE_10.1.0/com.ibm.ondemand.mp.doc/arsa0257.htm'
table = pd.read_html(http)
df = table[0]
df

# test which datestyles pandas can convert
df['Example_clean'] = pd.to_datetime(df['Example'])
print(df.dtypes)
df
# yes it converted all 25 different date formats!!
# Note - when using a time it automatically uses todays date.

output enter image description here

Upvotes: 0

rpanai
rpanai

Reputation: 13437

You can try the following

import pandas as pd
import numpy as np


n=1000
ch = ['13-07-2013 08:33:16', '13/07/2013 08:33:16']
df = pd.DataFrame({"date": np.random.choice(ch,n)})

df["date"] = df["date"].str.replace("/","-").astype("M8[us]")

Update I just realize that the format I'm using is not the same you would like. I strongly suggest you to use a standard format as YYYY-MM-DD and datetime as type instead of string. There are a lot of post that explains why this is more efficient on RAM and then faster.

A small comparative for a df with just 1000 rows

%%timeit
out = df["date"].str.replace("/","-").astype("M8[us]")

146 ms ± 5.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%%timeit
out = df["date"].apply(lambda x: pd.to_datetime(x)\
                                   .strftime('%d/%m/%Y %H:%M'))

396 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%%timeit
out = df['date'].apply(lambda x: pd.to_datetime(x,
                       format='%d/%m/%Y %H:%M',
                       infer_datetime_format= True))

425 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 0

Frenchy
Frenchy

Reputation: 17007

its better to use strftime():

df = pd.DataFrame({'Date': ['11/7/2013 11:51','13-07-2013 08:33:16']})
df['Clean_Date'] = df.Date.apply(lambda x: pd.to_datetime(x).strftime('%d/%m/%Y %H:%M'))
print(df)

output:

                  Date        Clean_Date
0      11/7/2013 11:51  07/11/2013 11:51
1  13-07-2013 08:33:16  13/07/2013 08:33

Upvotes: 1

cfort
cfort

Reputation: 2776

Try removing the format parameter and setting infer_datetime_format=Truein the arguments you pass to pd.to_datetime

Upvotes: 1

Related Questions