n4cer500
n4cer500

Reputation: 771

Pandas 0.15 DataFrame: Remove or reset time portion of a datetime64

I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:

2014-06-30 21:50:00

I simply want to either remove the time or set the time to midnight:

2014-06-30 00:00:00 

What is the easiest way of doing this?

Upvotes: 7

Views: 13494

Answers (6)

Sebastian N
Sebastian N

Reputation: 140

Since pd.datetools.normalize_date has been deprecated and you are working with the datetime64 data type, use:

df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))

This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.

df.your_date_col = df.your_date_col.dt.normalize()

Upvotes: 2

phil
phil

Reputation: 2578

pd.datetools.normalize_date has been deprecated. Use df['date_col'] = df['date_col'].dt.normalize() instead.

See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html

Upvotes: 5

Kathirmani Sukumar
Kathirmani Sukumar

Reputation: 10970

Use dt methods, which is vectorized to yield faster results.

# There are better ways of converting it in to datetime column. 
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date

Upvotes: 8

Kevin S
Kevin S

Reputation: 2803

The fastest way I have found to strip everything but the date is to use the underlying Numpy structure of pandas Timestamps.

import pandas as pd
dates = pd.to_datetime(['1990-1-1 1:00:11',
                        '1991-1-1',
                        '1999-12-31 12:59:59.999'])
dates

DatetimeIndex(['1990-01-01 01:00:11', '1991-01-01 00:00:00',
           '1999-12-31 12:59:59.999000'],
           dtype='datetime64[ns]', freq=None)

dates = dates.astype(np.int64)
ns_in_day = 24*60*60*np.int64(1e9)
dates //= ns_in_day
dates *= ns_in_day
dates = dates.astype(np.dtype('<M8[ns]'))
dates = pd.Series(dates)
dates

0   1990-01-01
1   1991-01-01
2   1999-12-31
dtype: datetime64[ns]

This might not work when data have timezone information.

Upvotes: 0

Frank
Frank

Reputation: 469

Pandas has a builtin function pd.datetools.normalize_date for that purpose:

df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)

It's implemented in Cython and does the following:

if PyDateTime_Check(dt):
    return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
    return datetime(dt.year, dt.month, dt.day)
else:
    raise TypeError('Unrecognized type: %s' % type(dt))

Upvotes: 13

EdChum
EdChum

Reputation: 393933

I can think of two ways, setting or assigning to a new column just the date() attribute, or calling replace on the datetime object and passing param hour=0, minute=0:

In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
             datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df

     datetime
0  2014-06-30
Out[107]:
    datetime
0 2014-06-30

Upvotes: 3

Related Questions