Priya Chauhan
Priya Chauhan

Reputation: 485

Format date columns in pandas dataframe

I have a pandas dataframe like this:

date_1 date_2 date_3
0020-01-31 0020-01-31 2020-01-31
0021-01-01 0021-12-31 2021-02-28
0021-01-01 0021-12-31 2021-02-28

AS the date_1 is string, so I have to replace 0020 to 2020 first as it is year but I am getting no change in the dataframe

my_df = my_df.replace(regex=['0020'], value='2020')
my_df.head()

Thanks for Help in advance!

Upvotes: 1

Views: 453

Answers (1)

nikeros
nikeros

Reputation: 3379

Well in theory you'd like to correct all the bad dates in every column, not just 2020:

import pandas as pd

df = pd.DataFrame({"date_1":["0020-01-01", "0021-01-02", "0000-01-03"], "date_2":["0020-01-01", "0021-01-02", "0000-01-03"], "date_3":["2020-01-01", "2021-01-02", "2000-01-03"]}) 

df.apply(lambda x:x.str.replace("^00", "20", regex=True))

FOLLOW UP

If some of columns are made of Timestamp and not str, the method above will fail. The following will avoid that problem but it needs the help of library re (as far as I know, you cannot use regex with normal str replace):

import re
import pandas as pd

df["date_3"] = pd.to_datetime(df["date_3"])

def swap(x): return re.sub("^00", "20", x) if type(x) is str else x

df.applymap(swap)

OUTPUT

        date1       date2       date3
0  2020-01-01  2020-01-01  2020-01-01
1  2021-01-02  2021-01-02  2021-01-02
2  2000-01-03  2000-01-03  2000-01-03

If you check .dtypes:

date_1    object
date_2    object
date_3    object
dtype: object

Upvotes: 1

Related Questions