Reputation: 1211
I have a Pandas DataFrame that has date values stored in 2 columns in the below format:
col1: 04-APR-2018 11:04:29
col2: 2018040415203
How could I convert this to a time stamp. Dtype of both of these columns is object
.
Upvotes: 43
Views: 177837
Reputation: 23459
There are a few ways to convert column values into timestamps, some more efficient than others. N.B. Passing format=
to to_datetime
makes the conversion much, much faster (see this post). You can find all possible combination of datetime formats at https://strftime.org/.
from datetime import datetime
x = pd.to_datetime(df['col1'], format='%d-%b-%Y %H:%M:%S')
y = df['col1'].apply(pd.Timestamp)
z = df['col1'].apply(datetime.strptime, args=('%d-%b-%Y %H:%M:%S',))
but ultimately, all produce the same object (x.equals(y) and x.equals(z)
returns True
) that looks like:
0 2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]
If we check the individual values, they are the same (x[0] == y[0] == z[0]
returns True
) that looks like
Timestamp('2018-04-04 11:04:29')
If we look at the source code, pd.Timestamp
is a subclass of datetime.datetime
, so all are ultimately tied by datetime.datetime
.
Upvotes: 4
Reputation: 578
You can try these as well. Try passing infer_datetime_format = True while reading the file.
if the above method fails try the following
df2 = pd.to_datetime(df.col1)
or
df2 = pd.to_datetime(df['col1'])
df2
Note the above methods will only convert the str to datetime format and return them in df2. In short df2 will have only the datetime format of str without a column name for it. If you want to retain other columns of the dataframe and want to give a header to the converted column you can try the following
df['col1_converetd'] = pd.to_datetime(df.col1)
or
df['col1_converetd'] = pd.to_datetime(df['col1'])
This is comforatble if you dont want to create a dataframe or want to refer the converted column in future together with other attributes of the dataframe.
Upvotes: 20
Reputation: 375915
For the first format you can simply pass to_datetime, for the latter you need to explicitly describe the date format (see the table of available directives in the python docs):
In [21]: df
Out[21]:
col1 col2
0 04-APR-2018 11:04:29 2018040415203
In [22]: pd.to_datetime(df.col1)
Out[22]:
0 2018-04-04 11:04:29
Name: col1, dtype: datetime64[ns]
In [23]: pd.to_datetime(df.col2, format="%Y%m%d%H%M%S")
Out[23]:
0 2018-04-04 15:20:03
Name: col2, dtype: datetime64[ns]
Upvotes: 45