Reputation: 1667
Scenario: I have a dataframe with multiple columns retrieved from excel worksheets. Some of these columns are dates where some values are dates (yyyy:mm:dd) and some are datetimes (yyyy:mm:dd 00.00.000000).
Question: How can I remove the time stamp from the dates when they are not the index of my dataframe?
What I already tried: From other posts here in SO (working with dates in pandas - remove unseen characters in datetime and convert to string and How to strip a pandas datetime of date, hours and seconds) I found:
pd.DatetimeIndex(dfST['timestamp']).date
and
strfitme (df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d'))
But I can't seem to find a way to use those directly on the wanted column when it is not the index of my dataframe.
Upvotes: 36
Views: 113189
Reputation: 23031
You can also use dt.normalize()
to convert times to midnight (null times don't render) or dt.floor
to floor the frequency to daily:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['timestamp'] = df['timestamp'].dt.normalize()
df['timestamp'] = df['timestamp'].dt.floor('D')
Note that this keeps the dtype of the column datetime64[ns]
because each element is still of type pd.Timestamp
, whereas dt.date
suggested in Andrew L's post converts it to object
because each element becomes type datetime.date
.
Also, it's worth noting that dt.normalize
and dt.floor('D')
are both significantly faster (approx. 10 times faster for longer dataframes) than dt.date
:
Code used to produce the timings plot:
from perfplot import plot
plot(
setup=lambda n: pd.Series([pd.Timestamp('now')]*n),
kernels=[lambda s: s.dt.date, lambda s: s.dt.normalize(), lambda s: s.dt.floor('D')],
labels= ["col.dt.date", "col.dt.normalize()", "col.dt.floor('D')"],
n_range=[2**k for k in range(21)],
xlabel='Length of column',
title='Removing Time From Datetime',
equality_check=lambda x,y: all(x.eq(y)));
Upvotes: 4
Reputation: 7038
You can do the following:
dfST['timestamp'] = pd.to_datetime(dfST['timestamp'])
to_datetime()
will infer the formatting of the date column. You can also pass errors='coerce'
if the column contains non-date values.
After completing the above, you'll be able to create a new column containing only date values:
dfST['new_date_column'] = dfST['timestamp'].dt.date
Upvotes: 63