Bogdanovist
Bogdanovist

Reputation: 1546

loc fails on a DataFrame using the DataFrame's own index?

I have a DataFrame with a DateTime index where there are many duplicate index labels (i.e. rows with the same datetime). I want to look at rows with the same datetime. So I have the following

utimes = pd.unique(data.index.tolist())
for time in utimes:
    data_now = data.loc[time]
    # Do some processing on the data_now

This fails with an example error: KeyError 'the label [2015-02-05 21:54:00+00:00] is not in the [index]'

Just to check that this isn't an issue in the creation of utimes, this fails

data.loc[data.index[0]]

with the same error message. How can this be? Here's what the index looks like

> data.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 21:54:00+00:00, ..., 2015-02-05 23:24:00+00:00]  
Length: 457, Freq: None, Timezone: UTC

and

> data.index[0]
Timestamp('2015-02-05 22:24:00+0000', tz='UTC')

Any ideas why I can't use .loc with a data_frame's own index??

Upvotes: 2

Views: 1020

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375515

It looks like pd.unique does not respect the datetime64 dtype:

In [11]: df.index
Out[11]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC

In [12]: pd.unique(df.index)
Out[12]: array([1423175040000000000L], dtype=object)

For now (until this bug is fixed in pandas) you can wrap this in a to_datetime call:

In [13]: pd.to_datetime(pd.unique(df.index))
Out[13]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00]
Length: 1, Freq: None, Timezone: None

or, more cleanly, you can use the unique method DatetimeIndex:

In [14]: df.index.unique()
Out[14]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-05 22:24:00+00:00]
Length: 1, Freq: None, Timezone: UTC

Upvotes: 4

Related Questions