Reputation: 485
I'm trying to understand why I get this error. I already have a solution for this issue and it was actually solved here, just need to understand why it doesn't work as I was expecting.
I would like to understand why this throws a KeyError:
dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.identity(4), index=dates, columns=list('ABCD'))
df.loc[['20130102', '20130103'],:]
with the following feedback:
KeyError: "None of [['20130102', '20130103']] are in the [index]"
As explained here, the solution is just to do:
df.loc[pd.to_datetime(['20130102','20130104']),:]
So the problem is definitely with the way loc takes the string list as argument for selecting from a DateTimeIndex. However, I can see that the following calls are ok for this function:
df.loc['20130102':'20130104',:]
and
df.loc['20130102']
I would like to understand how this works and would appreciate any resources I can use to predict the behavior of this function depending of how it is being called. I read Indexing and Selecting Data and Time Series/Date functionality from pandas documentation but couldn't find an explanation for this.
Upvotes: 3
Views: 3376
Reputation: 294488
Typically, when you pass an array like object to loc
, Pandas is going to try to locate each element of that array in the index. If it doesn't find it, you'll get a KeyError
. And! you passed an array of strings when the values in the index are Timestamp
s... so those strings definitely aren't in the index.
However, Pandas also tries to make things easier for you. In particular, with a DatetimeIndex
, If you were to pass a string scalar
df.loc['20130102']
A 0.0
B 1.0
C 0.0
D 0.0
Name: 2013-01-02 00:00:00, dtype: float64
Pandas will attempt to parse that scalar as a Timestamp
and see if that value is in the index.
If you were to pass a slice
object
df.loc['20130102':'20130104']
A B C D
2013-01-02 0.0 1.0 0.0 0.0
2013-01-03 0.0 0.0 1.0 0.0
2013-01-04 0.0 0.0 0.0 1.0
Pandas will also attempt to parse the bits of the slice object as Timestamp
and return an appropriately sliced dataframe.
Your KeyError
is simply passed the limits of how much helpfulness the Pandas Devs had time to code.
Upvotes: 2