python - convert multiple string datetimes format into a specific date format

Question

I have this column date with 6 different date strings sizes:

df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08',  1: '2020-03-21T17:43:03',  2: '2020-03-21T13:13:30',  3: '2020-03-21T20:43:02',  4: '3/8/20 5:31',  5: '3/8/20 5:19',  6: '3/22/20 23:45',  7: '3/22/20 23:45',  8: '2/1/2020 11:53',  9: '2/1/2020 10:53',  10: '1/31/2020 15:20',  11: '1/31/2020 10:37',  12: '2020-04-04 23:34:21',  13: '2020-04-04 23:34:21'}}, 
             index=range(0,14))

I need to convert all those different datetimes strings to date format. The approach I'm using is:

Find the first white space and extract the date
Change its format given a certain string length (each string length has its specific date format as you can see below in the format argument)
Do (2) in the respective rows in the dataframe df.

You can see this approach here:

df.loc[df["date"].str.find(" ") == 10, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == -1, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == 6, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 6, "date"].str[0:6], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 7, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 7, "date"].str[0:7], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 8, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 8, "date"].str[0:8], format="%m/%d/%Y")
df.loc[df["date"].str.find(" ") == 9, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 9, "date"].str[0:9], format="%m/%d/%Y")

I'm going perfect until step 3) where I'm trying to find a workaround to make all the format changes in the dataframe, but I can't understand why it doesn't give what it should give. Any suggestions?

By the way, it has to be scalable (I have a lot of rows per format string)

jezrael · Accepted Answer

For me working converting all values to datetimes and then remove times with Series.dt.floor if output is datetimes or with Series.dt.date if output are python dates:

df['date'] = pd.to_datetime(df['date']).dt.floor('d')
#dates
#df['date'] = pd.to_datetime(df['date']).dt.date
print (df)
         date
0  2020-03-21
1  2020-03-21
2  2020-03-21
3  2020-03-21
4  2020-03-08
5  2020-03-08
6  2020-03-22
7  2020-03-22
8  2020-02-01
9  2020-02-01
10 2020-01-31
11 2020-01-31
12 2020-04-04
13 2020-04-04

Your solution should be simplify - get first 10 letters, then split by possible space and get first values:

df['date'] = pd.to_datetime(df['date'].str[:10].str.split().str[0])

python - convert multiple string datetimes format into a specific date format

Answers (2)

Related Questions