Reputation: 382
I know this question was asked multiple times and I went through the solutions, however I couldn't find a solution to my problem
I do have two time columns, one column is time and one column is date column. What I would like to achieve is to have a merged column, i.e. timestamp column
Column A looks like this
0 "3.11.20"
1 "3.11.20"
2 "3.11.20"
3 "3.11.20"
4 "3.11.20"
1087 "5.11.20"
1088 "5.11.20"
1089 "5.11.20"
1090 "5.11.20"
1091 "5.11.20"
Column B looks like this
0 "11:38:48"
1 "11:39:44"
2 "11:41:35"
3 "11:42:31"
4 "11:43:27"
1087 "14:51:10"
1088 "14:52:01"
1089 "14:52:53"
1090 "14:53:44"
1091 "14:54:36"
My first approach was following, with follwing output
df["Timestamp"]=pd.to_datetime(df["colA"]+" "+df["colB"])
Unknown string format: "3.11.20" "11:38:48"
So my next approach was to transform each column indiviually to check the error, and got the following error
pd.to_datetime(raw_data["colA"],
format="%d.%m.%y")
time data '"3.11.20"' does not match format '%d.%m.%y' (match)
I am negatively surprised about having so much issues transforming these string columns into datetime format. I would be very grateful for any advice. I assume this might be due to padding zeros
Upvotes: 1
Views: 110
Reputation: 862641
Use format with ""
values with to_datetime
for convert wrong values to NaT
for missing values for datetimes:
df["Timestamp"]=pd.to_datetime(df["colA"]+df["colB"],
format='"%d.%m.%y""%H:%M:%S"',
errors='coerce')
Or remove "
by Series.str.strip
:
df["Timestamp"]=pd.to_datetime(df["colA"].str.strip('"')+" "+
df["colB"].str.strip('"'),
format='%d.%m.%y %H:%M:%S',
errors='coerce')
print (df)
colA colB Timestamp
0.0 "3.11.20" "11:38:48" 2020-11-03 11:38:48
1.0 "3.11.20" "11:39:44" 2020-11-03 11:39:44
2.0 "3.11.20" "11:41:35" 2020-11-03 11:41:35
3.0 "3.11.20" "11:42:31" 2020-11-03 11:42:31
4.0 "3.11.20" "11:43:27" 2020-11-03 11:43:27
1087.0 "5.11.20" "14:51:10" 2020-11-05 14:51:10
1088.0 "5.11.20" "14:52:01" 2020-11-05 14:52:01
1089.0 "5.11.20" "14:52:53" 2020-11-05 14:52:53
1090.0 "5.11.20" "14:53:44" 2020-11-05 14:53:44
1091.0 "5.11.20" "14:54:36" 2020-11-05 14:54:36
Upvotes: 1
Reputation: 26676
Data
print(df)
ColumnA ColumnB
0 "3.11.20" "11:38:48"
1 "3.11.20" "11:39:44"
2 "3.11.20" "11:41:35"
3 "3.11.20" "11:42:31"
4 "3.11.20" "11:43:27"
1087 "5.11.20" "14:51:10"
1088 "5.11.20" "14:52:01"
1089 "5.11.20" "14:52:53"
1090 "5.11.20" "14:53:44"
1091 "5.11.20" "14:54:36"
Solution
Strip the "
and use .str.cat
to concat. Coerce to datetime using .pd.to_datetime
df['timestamp']=pd.to_datetime(df['ColumnA'].str.strip('"').str.cat(df['ColumnB'].str.strip('"'), sep=' '), format="%d.%m.%y %H:%M:%S")
ColumnA ColumnB timestamp
0 "3.11.20" "11:38:48" 2020-11-03 11:38:48
1 "3.11.20" "11:39:44" 2020-11-03 11:39:44
2 "3.11.20" "11:41:35" 2020-11-03 11:41:35
3 "3.11.20" "11:42:31" 2020-11-03 11:42:31
4 "3.11.20" "11:43:27" 2020-11-03 11:43:27
1087 "5.11.20" "14:51:10" 2020-11-05 14:51:10
1088 "5.11.20" "14:52:01" 2020-11-05 14:52:01
1089 "5.11.20" "14:52:53" 2020-11-05 14:52:53
1090 "5.11.20" "14:53:44" 2020-11-05 14:53:44
1091 "5.11.20" "14:54:36" 2020-11-05 14:54:36
Upvotes: 0
Reputation: 69
if you are using read_csv use parse_dates=[['colA', 'colB']] in parameter, and after use your code
df["Timestamp"]=pd.to_datetime(df["colA"]+" "+df["colB"])
Upvotes: 0