pandas DataFrame selecting list of rows from DateTimeIndex - KeyError. Understanding why

Question

I'm trying to understand why I get this error. I already have a solution for this issue and it was actually solved here, just need to understand why it doesn't work as I was expecting.

I would like to understand why this throws a KeyError:

dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.identity(4), index=dates, columns=list('ABCD'))
df.loc[['20130102', '20130103'],:]

with the following feedback:

KeyError: "None of [['20130102', '20130103']] are in the [index]"

As explained here, the solution is just to do:

df.loc[pd.to_datetime(['20130102','20130104']),:]

So the problem is definitely with the way loc takes the string list as argument for selecting from a DateTimeIndex. However, I can see that the following calls are ok for this function:

df.loc['20130102':'20130104',:]

and

df.loc['20130102']

I would like to understand how this works and would appreciate any resources I can use to predict the behavior of this function depending of how it is being called. I read Indexing and Selecting Data and Time Series/Date functionality from pandas documentation but couldn't find an explanation for this.

piRSquared · Accepted Answer

Typically, when you pass an array like object to loc, Pandas is going to try to locate each element of that array in the index. If it doesn't find it, you'll get a KeyError. And! you passed an array of strings when the values in the index are Timestamps... so those strings definitely aren't in the index.

However, Pandas also tries to make things easier for you. In particular, with a DatetimeIndex, If you were to pass a string scalar

df.loc['20130102']

A    0.0
B    1.0
C    0.0
D    0.0
Name: 2013-01-02 00:00:00, dtype: float64

Pandas will attempt to parse that scalar as a Timestamp and see if that value is in the index.

If you were to pass a slice object

df.loc['20130102':'20130104']

              A    B    C    D
2013-01-02  0.0  1.0  0.0  0.0
2013-01-03  0.0  0.0  1.0  0.0
2013-01-04  0.0  0.0  0.0  1.0

Pandas will also attempt to parse the bits of the slice object as Timestamp and return an appropriately sliced dataframe.

Your KeyError is simply passed the limits of how much helpfulness the Pandas Devs had time to code.

pandas DataFrame selecting list of rows from DateTimeIndex - KeyError. Understanding why

Answers (1)

Related Questions