Senthil Babu
Senthil Babu

Reputation: 1263

Compare only object references in numpy

I have a numpy array of Python object. I want to compare the array against a python object and I don't want the comparison with == operator, but just a reference comparison is enough for my requirements.

import numpy as np
a = np.array(["abc", "def"], dtype="object")
a == "abc"

I am sure about my array that reference copy is enough. Let's say all strings, I have in my array are interned.

This is primarily to improve the performance when comparing zillion values. Python object comparisons are really slow.

a is "abc" won't do what I want because

In [1]: import numpy as np

In [2]: a = np.array(["abc", "def"], dtype="object")

In [3]: a == "abc"
Out[3]: array([ True, False], dtype=bool)

In [4]: a is "abc"
Out[4]: False

I want the result of a == "abc" but I don't Python's __eq__ method be used for the same but just the is operator.

Upvotes: 0

Views: 386

Answers (2)

Danica
Danica

Reputation: 28846

What about with np.vectorize:

vector_is = np.vectorize(lambda x, y: x is y, otypes=[bool])

Then you have

>>> a = np.array(["abc", "def"], dtype="object")

>>> vector_is(a, "abc")
array([ True, False], dtype=bool)

Unfortunately, I don't know if you can use operator.is_ here, because I get

ValueError: failed to determine the number of arguments for <built-in function is_>

This seems to be slightly slower than the list comprehension (probably because of the lambda calls), though it has the advantage of being a little more flexible in terms of the arguments it takes in:

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'vector_is(a, "abcd")'
10 loops, best of 3: 28.3 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' '[x is "abcd" for x in a]'
100 loops, best of 3: 20 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'np.fromiter((x is "abcd" for x in a), bool, len(a))'
10 loops, best of 3: 23.8 msec per loop

The last approach, np.fromiter((x is "abcd" for x in a), bool, len(a)), is one way to get a numpy array out of the list comprehension approach.

Unfortunately, all are much slower than just using ==:

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'a == "abcd"'                                        
1000 loops, best of 3: 1.42 msec per loop

Upvotes: 0

NPE
NPE

Reputation: 500437

a reference comparison is enough for my requirements

To compare object identity, use is instead of ==:

if a is b:
   ...

From the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

edit: To apply is to every element of your array, you could use:

In [6]: map(lambda x:x is "abc", a)
Out[6]: [True, False]

or simply:

In [9]: [x is "abc" for x in a]
Out[9]: [True, False]

Upvotes: 3

Related Questions