Chris
Chris

Reputation: 2081

python - convert multiple string datetimes format into a specific date format

I have this column date with 6 different date strings sizes:

df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08',  1: '2020-03-21T17:43:03',  2: '2020-03-21T13:13:30',  3: '2020-03-21T20:43:02',  4: '3/8/20 5:31',  5: '3/8/20 5:19',  6: '3/22/20 23:45',  7: '3/22/20 23:45',  8: '2/1/2020 11:53',  9: '2/1/2020 10:53',  10: '1/31/2020 15:20',  11: '1/31/2020 10:37',  12: '2020-04-04 23:34:21',  13: '2020-04-04 23:34:21'}}, 
             index=range(0,14))

I need to convert all those different datetimes strings to date format. The approach I'm using is:

  1. Find the first white space and extract the date

  2. Change its format given a certain string length (each string length has its specific date format as you can see below in the format argument)

  3. Do (2) in the respective rows in the dataframe df.

You can see this approach here:

df.loc[df["date"].str.find(" ") == 10, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == -1, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == 6, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 6, "date"].str[0:6], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 7, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 7, "date"].str[0:7], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 8, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 8, "date"].str[0:8], format="%m/%d/%Y")
df.loc[df["date"].str.find(" ") == 9, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 9, "date"].str[0:9], format="%m/%d/%Y")

I'm going perfect until step 3) where I'm trying to find a workaround to make all the format changes in the dataframe, but I can't understand why it doesn't give what it should give. Any suggestions?

By the way, it has to be scalable (I have a lot of rows per format string)

Upvotes: 0

Views: 466

Answers (2)

jezrael
jezrael

Reputation: 863651

For me working converting all values to datetimes and then remove times with Series.dt.floor if output is datetimes or with Series.dt.date if output are python dates:

df['date'] = pd.to_datetime(df['date']).dt.floor('d')
#dates
#df['date'] = pd.to_datetime(df['date']).dt.date
print (df)
         date
0  2020-03-21
1  2020-03-21
2  2020-03-21
3  2020-03-21
4  2020-03-08
5  2020-03-08
6  2020-03-22
7  2020-03-22
8  2020-02-01
9  2020-02-01
10 2020-01-31
11 2020-01-31
12 2020-04-04
13 2020-04-04

Your solution should be simplify - get first 10 letters, then split by possible space and get first values:

df['date'] = pd.to_datetime(df['date'].str[:10].str.split().str[0])

Upvotes: 2

Gustavo Gradvohl
Gustavo Gradvohl

Reputation: 712

import pandas as pd

df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08',  1: '2020-03-21T17:43:03',  2: '2020-03-21T13:13:30',  3: '2020-03-21T20:43:02',  4: '3/8/20 5:31',  5: '3/8/20 5:19',  6: '3/22/20 23:45',  7: '3/22/20 23:45',  8: '2/1/2020 11:53',  9: '2/1/2020 10:53',  10: '1/31/2020 15:20',  11: '1/31/2020 10:37',  12: '2020-04-04 23:34:21',  13: '2020-04-04 23:34:21'}}, 
             index=range(0,14))
df
    date
0   2020-03-21T10:13:08
1   2020-03-21T17:43:03
2   2020-03-21T13:13:30
3   2020-03-21T20:43:02
4   3/8/20 5:31
5   3/8/20 5:19
6   3/22/20 23:45
7   3/22/20 23:45
8   2/1/2020 11:53
9   2/1/2020 10:53
10  1/31/2020 15:20
11  1/31/2020 10:37
12  2020-04-04 23:34:21
13  2020-04-04 23:34:21

df['date'] = pd.to_datetime(df['date'])
df
    date
0   2020-03-21 10:13:08
1   2020-03-21 17:43:03
2   2020-03-21 13:13:30
3   2020-03-21 20:43:02
4   2020-03-08 05:31:00
5   2020-03-08 05:19:00
6   2020-03-22 23:45:00
7   2020-03-22 23:45:00
8   2020-02-01 11:53:00
9   2020-02-01 10:53:00
10  2020-01-31 15:20:00
11  2020-01-31 10:37:00
12  2020-04-04 23:34:21
13  2020-04-04 23:34:21

Upvotes: 1

Related Questions