fiktor
fiktor

Reputation: 1413

Why does pandas generate a KeyError when looking up date in date-indexed table?

Consider the following code:

date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64)
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index)
date_to_lookup = date_index[0]
print(df.at[date_to_lookup, 'a'])

One might expect it to work and print 1. Yet (at least in Anaconda python 3.7.3 with Pandas 0.24.2) it fails with the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/pandas/core/indexing.py", line 2270, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File ".../site-packages/pandas/core/frame.py", line 2771, in _get_value
    return engine.get_value(series._values, index)
  File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 447, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 17897

It appears that Pandas DataFrame and Series objects always store dates as dtype 'datetime64[ns]' or 'datetime64[ns, tz]', and the issue arises because Pandas automatically converts 'datetime64[D]' dtype to 'datetime64[ns]' when creating the index, but does not do that when looking up an element in that index. I could avoid the error above by converting the key to 'datetime64[ns]'. E.g. both of the following lines successfully print 1:

print(df.at[pd.to_datetime(date_to_lookup), 'a'])
print(df.at[date_to_lookup.astype('datetime64[ns]'), 'a'])

This behavior (automatic dtype conversion when creating an index, but not when looking up an element) seems counterintuitive to me. What is the reason it was implemented this way? Is there some coding style one is expected to follow to avoid errors like this? Or is it a bug I should file?

Upvotes: 4

Views: 881

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7224

I think this is a bug you found in 0.24.2, it works on my system python 3.7.2 and pandas 0.25.3:

date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64) 
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index) 
date_to_lookup = date_index[0] 
print(df.at[date_to_lookup, 'a'])                                                                                                                                                                
1

Upvotes: 2

jezrael
jezrael

Reputation: 863166

You can avoid this by select by positions with DataFrame.iat and Index.get_loc for position of column a:

print(df.iat[0, df.columns.get_loc('a')])
#alternative
#print(df.iloc[0, df.columns.get_loc('a')])
1

Another idea is use df.index for selecting instead date_index[0]:

print(df.at[df.index[0], 'a'])

Upvotes: 3

Related Questions