Reputation: 2081
I have this column date with 6 different date strings sizes:
df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08', 1: '2020-03-21T17:43:03', 2: '2020-03-21T13:13:30', 3: '2020-03-21T20:43:02', 4: '3/8/20 5:31', 5: '3/8/20 5:19', 6: '3/22/20 23:45', 7: '3/22/20 23:45', 8: '2/1/2020 11:53', 9: '2/1/2020 10:53', 10: '1/31/2020 15:20', 11: '1/31/2020 10:37', 12: '2020-04-04 23:34:21', 13: '2020-04-04 23:34:21'}},
index=range(0,14))
I need to convert all those different datetimes strings to date format. The approach I'm using is:
Find the first white space and extract the date
Change its format given a certain string length (each string length has its specific date format as you can see below in the format
argument)
Do (2) in the respective rows in the dataframe df
.
You can see this approach here:
df.loc[df["date"].str.find(" ") == 10, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == -1, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == 6, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 6, "date"].str[0:6], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 7, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 7, "date"].str[0:7], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 8, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 8, "date"].str[0:8], format="%m/%d/%Y")
df.loc[df["date"].str.find(" ") == 9, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 9, "date"].str[0:9], format="%m/%d/%Y")
I'm going perfect until step 3) where I'm trying to find a workaround to make all the format changes in the dataframe, but I can't understand why it doesn't give what it should give. Any suggestions?
By the way, it has to be scalable (I have a lot of rows per format string)
Upvotes: 0
Views: 466
Reputation: 863651
For me working converting all values to datetimes and then remove times with Series.dt.floor
if output is datetimes or with Series.dt.date
if output are python dates:
df['date'] = pd.to_datetime(df['date']).dt.floor('d')
#dates
#df['date'] = pd.to_datetime(df['date']).dt.date
print (df)
date
0 2020-03-21
1 2020-03-21
2 2020-03-21
3 2020-03-21
4 2020-03-08
5 2020-03-08
6 2020-03-22
7 2020-03-22
8 2020-02-01
9 2020-02-01
10 2020-01-31
11 2020-01-31
12 2020-04-04
13 2020-04-04
Your solution should be simplify - get first 10 letters, then split by possible space and get first values:
df['date'] = pd.to_datetime(df['date'].str[:10].str.split().str[0])
Upvotes: 2
Reputation: 712
import pandas as pd
df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08', 1: '2020-03-21T17:43:03', 2: '2020-03-21T13:13:30', 3: '2020-03-21T20:43:02', 4: '3/8/20 5:31', 5: '3/8/20 5:19', 6: '3/22/20 23:45', 7: '3/22/20 23:45', 8: '2/1/2020 11:53', 9: '2/1/2020 10:53', 10: '1/31/2020 15:20', 11: '1/31/2020 10:37', 12: '2020-04-04 23:34:21', 13: '2020-04-04 23:34:21'}},
index=range(0,14))
df
date
0 2020-03-21T10:13:08
1 2020-03-21T17:43:03
2 2020-03-21T13:13:30
3 2020-03-21T20:43:02
4 3/8/20 5:31
5 3/8/20 5:19
6 3/22/20 23:45
7 3/22/20 23:45
8 2/1/2020 11:53
9 2/1/2020 10:53
10 1/31/2020 15:20
11 1/31/2020 10:37
12 2020-04-04 23:34:21
13 2020-04-04 23:34:21
df['date'] = pd.to_datetime(df['date'])
df
date
0 2020-03-21 10:13:08
1 2020-03-21 17:43:03
2 2020-03-21 13:13:30
3 2020-03-21 20:43:02
4 2020-03-08 05:31:00
5 2020-03-08 05:19:00
6 2020-03-22 23:45:00
7 2020-03-22 23:45:00
8 2020-02-01 11:53:00
9 2020-02-01 10:53:00
10 2020-01-31 15:20:00
11 2020-01-31 10:37:00
12 2020-04-04 23:34:21
13 2020-04-04 23:34:21
Upvotes: 1