Reputation: 1413
Consider the following code:
date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64)
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index)
date_to_lookup = date_index[0]
print(df.at[date_to_lookup, 'a'])
One might expect it to work and print 1. Yet (at least in Anaconda python 3.7.3
with Pandas 0.24.2
) it fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../site-packages/pandas/core/indexing.py", line 2270, in __getitem__
return self.obj._get_value(*key, takeable=self._takeable)
File ".../site-packages/pandas/core/frame.py", line 2771, in _get_value
return engine.get_value(series._values, index)
File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 447, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 987, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 993, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 17897
It appears that Pandas DataFrame and Series objects always store dates as dtype 'datetime64[ns]'
or 'datetime64[ns, tz]'
, and the issue arises because Pandas automatically converts 'datetime64[D]'
dtype to 'datetime64[ns]'
when creating the index, but does not do that when looking up an element in that index. I could avoid the error above by converting the key to 'datetime64[ns]'
. E.g. both of the following lines successfully print 1
:
print(df.at[pd.to_datetime(date_to_lookup), 'a'])
print(df.at[date_to_lookup.astype('datetime64[ns]'), 'a'])
This behavior (automatic dtype conversion when creating an index, but not when looking up an element) seems counterintuitive to me. What is the reason it was implemented this way? Is there some coding style one is expected to follow to avoid errors like this? Or is it a bug I should file?
Upvotes: 4
Views: 881
Reputation: 7224
I think this is a bug you found in 0.24.2, it works on my system python 3.7.2 and pandas 0.25.3:
date_index = np.array(['2019-01-01', '2019-01-02'], dtype=np.datetime64)
df = pd.DataFrame({'a': np.array([1, 2])}, index=date_index)
date_to_lookup = date_index[0]
print(df.at[date_to_lookup, 'a'])
1
Upvotes: 2
Reputation: 863166
You can avoid this by select by positions with DataFrame.iat
and Index.get_loc
for position of column a
:
print(df.iat[0, df.columns.get_loc('a')])
#alternative
#print(df.iloc[0, df.columns.get_loc('a')])
1
Another idea is use df.index
for selecting instead date_index[0]
:
print(df.at[df.index[0], 'a'])
Upvotes: 3