LivingstoneM
LivingstoneM

Reputation: 1088

Function to Change date formats of multiple date columns

I have a large dataframe with multiple date formats and I want to clean this dataset but I did like to know if its possible to find this date columns and change its formats without single covering one column at a time.

This is how I pulled all columns that starts with a prefix date

date_df = df.loc[:,df.columns.str.startswith("date")] 

Then from here I wrote a function to change date formats from string to datetime Y-M-D.

from datetime import datetime
def date_fomated():
dates = df.loc[:,date_df.columns.datetime.strftime('%m/%d/%Y')]
return dates

This has not worked..My expected result is create a function that subsets date columns using prefix "date" the format dates in those multiple date columns to Y-M-D.

Here is a small dataset to use

import pandas as pd
data = [['2020-06-24','2020-01-24'],['2020-05-04','2020-06-24'],['2018-11-24','2018-11-24'],['2018-11-24','2018-11-24']]
df = pd.DataFrame(data,columns=['date_today','date_adm'],dtype=float)
print (df)

Kindly assist

Upvotes: 1

Views: 494

Answers (2)

wwnde
wwnde

Reputation: 26676

Please filter using df.filter(like='date', axis=1)(contains 'date') or df.filter(regex='^date', axis=1)(starts with date) and apply pd.to_datetimeto thedf`

Original df dtypes

 df.filter(like='date', axis=1).dtypes#filter columns that contain date and check their dtypes
    
    df.dtypes
    date_today    object
    date_adm      object
    dtype: object

Solution

df.filter(like='date', axis=1).apply(lambda x: pd.to_datetime(x))

New dtypes

df.filter(like='date', axis=1).apply(lambda x: pd.to_datetime(x)).dtypes


df.dtypes
date_today    datetime64[ns]
date_adm      datetime64[ns]
dtype: object

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

lets use stack and filter

df = pd.DataFrame({'date1' : ['20200105','20210305'],
                  'date2' : [20200103, 20210309],
                  'id' : [0,1] })


      date1     date2  id
0  20200105  20200103   0
1  20210305  20210309   1


df.loc[:,df.filter(like='date').columns] = df.filter(like='date')\
                                                .stack()\
                                                .apply(pd.to_datetime,
                                                       format='%Y%m%d')\
                                                .unstack()
print(df)

       date1      date2  id
0 2020-01-05 2020-01-03   0
1 2021-03-05 2021-03-09   1

print(df.dtypes)

date1    datetime64[ns]
date2    datetime64[ns]
id                int64
dtype: object

Upvotes: 1

Related Questions