Oliver
Oliver

Reputation: 572

How to remove microseconds from DateTimeIndex in dataframe in Python?

I want to remove the microseconds from index.

My index is like this:

DatetimeIndex(['2003-11-20 13:07:40.895000+00:00',
           '2003-11-20 13:16:13.039000+00:00',
           '2003-11-20 13:24:44.868000+00:00',
           '2003-11-20 13:33:17.013000+00:00',
           '2003-11-20 13:41:49.158000+00:00',
           '2003-11-20 13:50:20.987000+00:00',
           '2003-11-20 13:58:53.132000+00:00',
           '2003-11-20 14:07:24.961000+00:00',
           '2003-11-20 14:15:57.106000+00:00',
           '2003-11-20 14:24:28.935000+00:00',
           ...
           '2003-12-04 19:28:56.025000+00:00',
           '2003-12-04 19:37:27.854000+00:00',
           '2003-12-04 19:45:59.999000+00:00',
           '2003-12-04 19:54:32.143000+00:00',
           '2003-12-04 20:03:03.972000+00:00',
           '2003-12-04 20:11:36.117000+00:00',
           '2003-12-04 20:20:07.946000+00:00',
           '2003-12-04 20:28:40.091000+00:00',
           '2003-12-04 20:37:11.920000+00:00',
           '2003-12-04 20:45:44.065000+00:00'],
          dtype='datetime64[ns, UTC]'

And I want to remove the microseconds in order to have something like this only: '2003-12-04 20:45:44' I do not want to convert it to string, as it is needed to remain datetime because it is the index of the dataframe. I have been searching for this, but I only found this, which does not work:

df.index.replace(microsecond=0, inplace = True)

Can you help me please?

Upvotes: 3

Views: 6596

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Given a pd.DateTimeIndex with timezone information and millisecond data like this:

didx = pd.DatetimeIndex(['2003-11-20 13:07:40.895000+00:00',
           '2003-11-20 13:16:13.039000+00:00',
           '2003-11-20 13:24:44.868000+00:00',
           '2003-11-20 13:33:17.013000+00:00',
           '2003-11-20 13:41:49.158000+00:00',
           '2003-11-20 13:50:20.987000+00:00',
           '2003-11-20 13:58:53.132000+00:00',
           '2003-11-20 14:07:24.961000+00:00',
           '2003-11-20 14:15:57.106000+00:00',
           '2003-11-20 14:24:28.935000+00:00',
           '2003-12-04 19:28:56.025000+00:00',
           '2003-12-04 19:37:27.854000+00:00',
           '2003-12-04 19:45:59.999000+00:00',
           '2003-12-04 19:54:32.143000+00:00',
           '2003-12-04 20:03:03.972000+00:00',
           '2003-12-04 20:11:36.117000+00:00',
           '2003-12-04 20:20:07.946000+00:00',
           '2003-12-04 20:28:40.091000+00:00',
           '2003-12-04 20:37:11.920000+00:00',
           '2003-12-04 20:45:44.065000+00:00'],
          dtype='datetime64[ns, UTC]')

You can use pd.DateTimeIndex.floor and tz_localize(None), to truncate timestamps to seconds and remove the timezone information.

didx.floor('S').tz_localize(None)

Output:

DatetimeIndex(['2003-11-20 13:07:40', '2003-11-20 13:16:13',
               '2003-11-20 13:24:44', '2003-11-20 13:33:17',
               '2003-11-20 13:41:49', '2003-11-20 13:50:20',
               '2003-11-20 13:58:53', '2003-11-20 14:07:24',
               '2003-11-20 14:15:57', '2003-11-20 14:24:28',
               '2003-12-04 19:28:56', '2003-12-04 19:37:27',
               '2003-12-04 19:45:59', '2003-12-04 19:54:32',
               '2003-12-04 20:03:03', '2003-12-04 20:11:36',
               '2003-12-04 20:20:07', '2003-12-04 20:28:40',
               '2003-12-04 20:37:11', '2003-12-04 20:45:44'],
              dtype='datetime64[ns]', freq=None)

Upvotes: 5

Joseph Hissong
Joseph Hissong

Reputation: 3

You should be able to use .strftime('%Y-%m-%d %H:%M:%S') on each.

Upvotes: 0

Related Questions