Reputation: 1088
I have a large dataframe with multiple date formats and I want to clean this dataset but I did like to know if its possible to find this date columns and change its formats without single covering one column at a time.
This is how I pulled all columns that starts with a prefix date
date_df = df.loc[:,df.columns.str.startswith("date")]
Then from here I wrote a function to change date formats from string to datetime Y-M-D.
from datetime import datetime
def date_fomated():
dates = df.loc[:,date_df.columns.datetime.strftime('%m/%d/%Y')]
return dates
This has not worked..My expected result is create a function that subsets date columns using prefix "date" the format dates in those multiple date columns to Y-M-D.
Here is a small dataset to use
import pandas as pd
data = [['2020-06-24','2020-01-24'],['2020-05-04','2020-06-24'],['2018-11-24','2018-11-24'],['2018-11-24','2018-11-24']]
df = pd.DataFrame(data,columns=['date_today','date_adm'],dtype=float)
print (df)
Kindly assist
Upvotes: 1
Views: 494
Reputation: 26676
Please filter using df.filter(like='date', axis=1)
(contains 'date') or df.filter(regex='^date', axis=1)(starts with date) and apply
pd.to_datetimeto the
df`
Original df dtypes
df.filter(like='date', axis=1).dtypes#filter columns that contain date and check their dtypes
df.dtypes
date_today object
date_adm object
dtype: object
Solution
df.filter(like='date', axis=1).apply(lambda x: pd.to_datetime(x))
New dtypes
df.filter(like='date', axis=1).apply(lambda x: pd.to_datetime(x)).dtypes
df.dtypes
date_today datetime64[ns]
date_adm datetime64[ns]
dtype: object
Upvotes: 1
Reputation: 23099
lets use stack
and filter
df = pd.DataFrame({'date1' : ['20200105','20210305'],
'date2' : [20200103, 20210309],
'id' : [0,1] })
date1 date2 id
0 20200105 20200103 0
1 20210305 20210309 1
df.loc[:,df.filter(like='date').columns] = df.filter(like='date')\
.stack()\
.apply(pd.to_datetime,
format='%Y%m%d')\
.unstack()
print(df)
date1 date2 id
0 2020-01-05 2020-01-03 0
1 2021-03-05 2021-03-09 1
print(df.dtypes)
date1 datetime64[ns]
date2 datetime64[ns]
id int64
dtype: object
Upvotes: 1