Reputation: 1851
I'm having a weird problem with the np.isin function. If I create a short pd.DatetimeIndex, and a date which exists within that index:
test_index = pd.date_range(start='2000-01-01', end='2000-01-15',freq='B')
test_date = test_index[0]
I can check that the test_date is in fact the first element of the index:
test_date == test_index[0]
True
But the np.isin function seems to be unable to recognize test_date within test_index:
np.isin(test_index, test_date)
array([False, False, False, False, False, False, False, False, False,
False])
This occurs if I write this as
np.isin(test_index.values, test_date)
This seems wrong and weird. The data type of both test_date and test_index[0] is given as pd.Timestamp and there's no visible difference between them. Any help gratefully received.
Upvotes: 4
Views: 798
Reputation: 23022
This isn't a numpy issue, it's a pandas issue. The problem is because pd.date_range
creates a DatetimeIndex
, which is a special type of index and stores the objects differently than what you get when you access them. From the docs on DatetimeIndex
:
Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
That is hard to parse. "Array of type1
data, represented as type2
, that gives you type3
objects when you index."
I actually do not get the same type for each from Pandas; the type of the test_date
is pandas._libs.tslib.Timestamp
for Pandas 0.22.0, which is in line with this documentation.
>>> test_index.dtype
dtype('<M8[ns]')
>>> type(test_date)
pandas._libs.tslib.Timestamp
As the docs state, this Timestamp
has additional metadata, which does not convert well in numpy:
>>> np.array(test_date)
array(Timestamp('2000-01-03 00:00:00', freq='B'), dtype=object)
You can see I just got an object...that object is definitely not what is stored in the DatetimeIndex
. This is what actually happens implicitly in numpy. From the docs on np.isin()
(in the Notes section):
If test_elements is a set (or other non-sequence collection) it will be converted to an object array with one element.
So as we can see, the value is getting pushed into this object
array, instead of a datetime64
array, so you won't find your object in the test_index
array.
The best bet is to use the built-in methods on a DatetimeIndex
to search it, but you could also explicitly cast so numpy knows what's going on. Here are some different ways you could do this:
>>> np.isin(test_index, np.datetime64(test_date))
array([ True, False, False, False, False, False, False, False, False,
False])
>>> test_index == test_date
array([ True, False, False, False, False, False, False, False, False,
False])
>>> test_index.isin([test_date])
array([ True, False, False, False, False, False, False, False, False,
False])
>>> test_index.contains(test_date) # if you just need yes or no
True
Upvotes: 6