Reputation: 671
I have a column called Time
in a dataframe that looks like this:
599359 12:32:25
326816 17:55:22
326815 17:55:22
358789 12:48:25
361553 12:06:45
...
814512 21:22:07
268266 18:57:31
659699 14:28:20
659698 14:28:20
268179 17:48:53
Name: Time, Length: 546967, dtype: object
And right now it is an object
dtype. I've tried the following to convert it to a datetime:
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time
And I understand that the .dt.time
methods are needed to prevent the Year and Month from being added, but I believe this is causing the dtype to revert to an object.
Any workarounds? I know I could do
df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True)
but I have over 500,000 rows and this is taking forever.
Upvotes: 4
Views: 3925
Reputation: 2508
When you do this bit: df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time
, you're converting the 'Time'
column to have pd.dtype
as object
... and that "object" is the python type datetime.time
.
The pandas dtype pd.datetime
is a different type than python's datetime.datetime
objects. And pandas' pd.datetime
does not support time
objects (i.e. you can't have pandas consider the column a datetime without providing the year). This is the dtype is changing to object
.
In the case of your second approach, df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True)
there is something slightly different happening. In this case you're applying the pd.to_datetime
to each scalar element of the 'Time'
series. Take a look at the return types of the function in the docs, but basically in this case the time values in your df are being converted to pd.datetime
objects on the 1st of january 1900. (i.e. a default date is added).
So: pandas is behaving correctly. If you only want the times, then it's okay to use the datetime.time
objects in the column. But to operate on them you'll probably be relying on many [slow] df.apply
methods. Alternatively, just keep the default date of 1900-01-01
and then you can add/subtract the pd.datetime
columns and get the speed advantage of pandas. Then just strip off the date when you're done with it.
Upvotes: 4