SMS
SMS

Reputation: 382

Transform string column into datetime

I know this question was asked multiple times and I went through the solutions, however I couldn't find a solution to my problem

I do have two time columns, one column is time and one column is date column. What I would like to achieve is to have a merged column, i.e. timestamp column

Column A looks like this

0       "3.11.20"
1       "3.11.20"
2       "3.11.20"
3       "3.11.20"
4       "3.11.20"
   
1087    "5.11.20"
1088    "5.11.20"
1089    "5.11.20"
1090    "5.11.20"
1091    "5.11.20"

Column B looks like this

0       "11:38:48"
1       "11:39:44"
2       "11:41:35"
3       "11:42:31"
4       "11:43:27"
   
1087    "14:51:10"
1088    "14:52:01"
1089    "14:52:53"
1090    "14:53:44"
1091    "14:54:36"

My first approach was following, with follwing output

df["Timestamp"]=pd.to_datetime(df["colA"]+" "+df["colB"])


Unknown string format: "3.11.20" "11:38:48"

So my next approach was to transform each column indiviually to check the error, and got the following error

pd.to_datetime(raw_data["colA"],
                     format="%d.%m.%y")

time data '"3.11.20"' does not match format '%d.%m.%y' (match)

I am negatively surprised about having so much issues transforming these string columns into datetime format. I would be very grateful for any advice. I assume this might be due to padding zeros

Upvotes: 1

Views: 110

Answers (3)

jezrael
jezrael

Reputation: 862641

Use format with "" values with to_datetime for convert wrong values to NaT for missing values for datetimes:

df["Timestamp"]=pd.to_datetime(df["colA"]+df["colB"], 
                               format='"%d.%m.%y""%H:%M:%S"',
                               errors='coerce')

Or remove " by Series.str.strip:

df["Timestamp"]=pd.to_datetime(df["colA"].str.strip('"')+" "+
                               df["colB"].str.strip('"'), 
                               format='%d.%m.%y %H:%M:%S', 
                               errors='coerce')

print (df)
             colA        colB           Timestamp
0.0     "3.11.20"  "11:38:48" 2020-11-03 11:38:48
1.0     "3.11.20"  "11:39:44" 2020-11-03 11:39:44
2.0     "3.11.20"  "11:41:35" 2020-11-03 11:41:35
3.0     "3.11.20"  "11:42:31" 2020-11-03 11:42:31
4.0     "3.11.20"  "11:43:27" 2020-11-03 11:43:27
1087.0  "5.11.20"  "14:51:10" 2020-11-05 14:51:10
1088.0  "5.11.20"  "14:52:01" 2020-11-05 14:52:01
1089.0  "5.11.20"  "14:52:53" 2020-11-05 14:52:53
1090.0  "5.11.20"  "14:53:44" 2020-11-05 14:53:44
1091.0  "5.11.20"  "14:54:36" 2020-11-05 14:54:36

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Data

print(df)

     ColumnA     ColumnB
0     "3.11.20"  "11:38:48"
1     "3.11.20"  "11:39:44"
2     "3.11.20"  "11:41:35"
3     "3.11.20"  "11:42:31"
4     "3.11.20"  "11:43:27"
1087  "5.11.20"  "14:51:10"
1088  "5.11.20"  "14:52:01"
1089  "5.11.20"  "14:52:53"
1090  "5.11.20"  "14:53:44"
1091  "5.11.20"  "14:54:36"    

Solution

Strip the " and use .str.cat to concat. Coerce to datetime using .pd.to_datetime

df['timestamp']=pd.to_datetime(df['ColumnA'].str.strip('"').str.cat(df['ColumnB'].str.strip('"'), sep=' '), format="%d.%m.%y %H:%M:%S")

      ColumnA     ColumnB           timestamp
0     "3.11.20"  "11:38:48" 2020-11-03 11:38:48
1     "3.11.20"  "11:39:44" 2020-11-03 11:39:44
2     "3.11.20"  "11:41:35" 2020-11-03 11:41:35
3     "3.11.20"  "11:42:31" 2020-11-03 11:42:31
4     "3.11.20"  "11:43:27" 2020-11-03 11:43:27
1087  "5.11.20"  "14:51:10" 2020-11-05 14:51:10
1088  "5.11.20"  "14:52:01" 2020-11-05 14:52:01
1089  "5.11.20"  "14:52:53" 2020-11-05 14:52:53
1090  "5.11.20"  "14:53:44" 2020-11-05 14:53:44
1091  "5.11.20"  "14:54:36" 2020-11-05 14:54:36

Upvotes: 0

Holzer
Holzer

Reputation: 69

if you are using read_csv use parse_dates=[['colA', 'colB']] in parameter, and after use your code

df["Timestamp"]=pd.to_datetime(df["colA"]+" "+df["colB"])

Upvotes: 0

Related Questions