Reputation: 1263
I have a numpy array of Python object. I want to compare the array against a python object and I don't want the comparison with ==
operator, but just a reference comparison is enough for my requirements.
import numpy as np
a = np.array(["abc", "def"], dtype="object")
a == "abc"
I am sure about my array that reference copy is enough. Let's say all strings, I have in my array are interned.
This is primarily to improve the performance when comparing zillion values. Python object comparisons are really slow.
a is "abc" won't do what I want because
In [1]: import numpy as np
In [2]: a = np.array(["abc", "def"], dtype="object")
In [3]: a == "abc"
Out[3]: array([ True, False], dtype=bool)
In [4]: a is "abc"
Out[4]: False
I want the result of a == "abc"
but I don't Python's __eq__
method be used for the same but just the is
operator.
Upvotes: 0
Views: 386
Reputation: 28846
What about with np.vectorize:
vector_is = np.vectorize(lambda x, y: x is y, otypes=[bool])
Then you have
>>> a = np.array(["abc", "def"], dtype="object")
>>> vector_is(a, "abc")
array([ True, False], dtype=bool)
Unfortunately, I don't know if you can use operator.is_
here, because I get
ValueError: failed to determine the number of arguments for <built-in function is_>
This seems to be slightly slower than the list comprehension (probably because of the lambda
calls), though it has the advantage of being a little more flexible in terms of the arguments it takes in:
python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'vector_is(a, "abcd")'
10 loops, best of 3: 28.3 msec per loop
python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' '[x is "abcd" for x in a]'
100 loops, best of 3: 20 msec per loop
python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'np.fromiter((x is "abcd" for x in a), bool, len(a))'
10 loops, best of 3: 23.8 msec per loop
The last approach, np.fromiter((x is "abcd" for x in a), bool, len(a))
, is one way to get a numpy array out of the list comprehension approach.
Unfortunately, all are much slower than just using ==
:
python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'a == "abcd"'
1000 loops, best of 3: 1.42 msec per loop
Upvotes: 0
Reputation: 500437
a reference comparison is enough for my requirements
To compare object identity, use is
instead of ==
:
if a is b:
...
From the documentation:
The operators
is
andis not
test for object identity:x is y
is true if and only ifx
andy
are the same object.x is not y
yields the inverse truth value.
edit: To apply is
to every element of your array, you could use:
In [6]: map(lambda x:x is "abc", a)
Out[6]: [True, False]
or simply:
In [9]: [x is "abc" for x in a]
Out[9]: [True, False]
Upvotes: 3