Reputation: 771
I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:
2014-06-30 21:50:00
I simply want to either remove the time or set the time to midnight:
2014-06-30 00:00:00
What is the easiest way of doing this?
Upvotes: 7
Views: 13494
Reputation: 140
Since pd.datetools.normalize_date
has been deprecated and you are working with the datetime64
data type, use:
df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))
This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.
df.your_date_col = df.your_date_col.dt.normalize()
Upvotes: 2
Reputation: 2578
pd.datetools.normalize_date
has been deprecated. Use df['date_col'] = df['date_col'].dt.normalize()
instead.
See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html
Upvotes: 5
Reputation: 10970
Use dt
methods, which is vectorized to yield faster results.
# There are better ways of converting it in to datetime column.
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date
Upvotes: 8
Reputation: 2803
The fastest way I have found to strip everything but the date is to use the underlying Numpy structure of pandas Timestamps.
import pandas as pd
dates = pd.to_datetime(['1990-1-1 1:00:11',
'1991-1-1',
'1999-12-31 12:59:59.999'])
dates
DatetimeIndex(['1990-01-01 01:00:11', '1991-01-01 00:00:00',
'1999-12-31 12:59:59.999000'],
dtype='datetime64[ns]', freq=None)
dates = dates.astype(np.int64)
ns_in_day = 24*60*60*np.int64(1e9)
dates //= ns_in_day
dates *= ns_in_day
dates = dates.astype(np.dtype('<M8[ns]'))
dates = pd.Series(dates)
dates
0 1990-01-01
1 1991-01-01
2 1999-12-31
dtype: datetime64[ns]
This might not work when data have timezone information.
Upvotes: 0
Reputation: 469
Pandas has a builtin function pd.datetools.normalize_date
for that purpose:
df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)
It's implemented in Cython and does the following:
if PyDateTime_Check(dt):
return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
return datetime(dt.year, dt.month, dt.day)
else:
raise TypeError('Unrecognized type: %s' % type(dt))
Upvotes: 13
Reputation: 393933
I can think of two ways, setting or assigning to a new column just the date()
attribute, or calling replace
on the datetime object and passing param hour=0, minute=0
:
In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df
datetime
0 2014-06-30
Out[107]:
datetime
0 2014-06-30
Upvotes: 3