Reputation: 14826
My question is probably very simple but I can't figure out a way to make this operation faster
print a[(b==c[i]) for i in arange(0,len(c))]
where a,b and c are three numpy
arrays. I'm dealing with arrays with millions of entry and the piece of code above is the bottleneck of my program.
Upvotes: 0
Views: 165
Reputation: 5362
How about np.where()
:
>>> a = np.array([2,4,8,16])
>>> b = np.array([0,0,0,0])
>>> c = np.array([1,0,0,1])
>>> bc = np.where(b==c)[0] #indices where b == c
>>> a[bc]
array([4,8])
This should do the trick. Not sure if the timing is optimal for your purposes
>>> a = np.random.randint(0,10000,1000000)
>>> b = np.random.randint(0,10000,1000000)
>>> c = np.random.randint(0,10000,1000000)
>>> %timeit( a[ np.where( b == c )[0] ] )
100 loops, best of 3: 11.3 ms per loop
Upvotes: 0
Reputation: 19537
You should probably look into broadcasting. I assume you are looking for something like the following?
>>> b=np.arange(5)
>>> c=np.arange(6).reshape(-1,1)
>>> b
array([0, 1, 2, 3, 4])
>>> c
array([[0],
[1],
[2],
[3],
[4],
[5]])
>>> b==c
array([[ True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False],
[False, False, False, False, True],
[False, False, False, False, False]], dtype=bool)
>>> np.any(b==c,axis=1)
array([ True, True, True, True, True, False], dtype=bool)
Well for large arrays you can try:
import timeit
s="""
import numpy as np
array_size=500
a=np.random.randint(500, size=(array_size))
b=np.random.randint(500, size=(array_size))
c=np.random.randint(500, size=(array_size))
"""
ex1="""
a[np.any(b==c.reshape(-1,1),axis=0)]
"""
ex2="""
a[np.in1d(b,c)]
"""
print 'Example 1 took',timeit.timeit(ex1,setup=s,number=100),'seconds.'
print 'Example 2 took',timeit.timeit(ex2,setup=s,number=100),'seconds.'
When array_size is 50:
Example 1 took 0.00323104858398 seconds.
Example 2 took 0.0125901699066 seconds.
When array_size is 500:
Example 1 took 0.142632007599 seconds.
Example 2 took 0.0283041000366 seconds.
When array_size is 5,000:
Example 1 took 16.2110910416 seconds.
Example 2 took 0.170011043549 seconds.
When array_size is 50,000 (number=5):
Example 1 took 33.0327301025 seconds.
Example 2 took 0.0996031761169 seconds.
Note I had to change which axis for np.any() so the results would be the same. Reverse order of np.in1d or switch axis of np.any for desired effect. You can take reshape out of example 1, but reshape is really quite fast. Switch to obtain the desired effect. Really interesting- I will have to use this in the future.
Upvotes: 2
Reputation: 69172
Are you trying to get the values of a
where b==c
?
If so, you can just do a[b==c]
:
from numpy import *
a = arange(11)
b = 11*a
c = b[::-1]
print a # [ 0 1 2 3 4 5 6 7 8 9 10]
print b # [ 0 11 22 33 44 55 66 77 88 99 110]
print c # [110 99 88 77 66 55 44 33 22 11 0]
print a[b==c] # [5]
Upvotes: 4