Brian
Brian

Reputation: 14826

Slow array operation in python

My question is probably very simple but I can't figure out a way to make this operation faster

  print a[(b==c[i]) for i in arange(0,len(c))]

where a,b and c are three numpy arrays. I'm dealing with arrays with millions of entry and the piece of code above is the bottleneck of my program.

Upvotes: 0

Views: 165

Answers (3)

dermen
dermen

Reputation: 5362

How about np.where() :

>>> a  = np.array([2,4,8,16])
>>> b  = np.array([0,0,0,0])
>>> c  = np.array([1,0,0,1])
>>> bc = np.where(b==c)[0] #indices where b == c
>>> a[bc]
array([4,8])

This should do the trick. Not sure if the timing is optimal for your purposes

>>> a = np.random.randint(0,10000,1000000)
>>> b = np.random.randint(0,10000,1000000)
>>> c = np.random.randint(0,10000,1000000)
>>> %timeit( a[ np.where( b == c )[0] ]   )
100 loops, best of 3: 11.3 ms per loop

Upvotes: 0

Daniel
Daniel

Reputation: 19537

You should probably look into broadcasting. I assume you are looking for something like the following?

>>> b=np.arange(5)
>>> c=np.arange(6).reshape(-1,1)
>>> b
array([0, 1, 2, 3, 4])
>>> c
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])
>>> b==c
array([[ True, False, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False, False]], dtype=bool)
>>> np.any(b==c,axis=1)
array([ True,  True,  True,  True,  True, False], dtype=bool)

Well for large arrays you can try:

import timeit

s="""
import numpy as np
array_size=500
a=np.random.randint(500, size=(array_size))
b=np.random.randint(500, size=(array_size))
c=np.random.randint(500, size=(array_size))
"""

ex1="""
a[np.any(b==c.reshape(-1,1),axis=0)]
"""

ex2="""
a[np.in1d(b,c)]
"""

print 'Example 1 took',timeit.timeit(ex1,setup=s,number=100),'seconds.'
print 'Example 2 took',timeit.timeit(ex2,setup=s,number=100),'seconds.'

When array_size is 50:

Example 1 took 0.00323104858398 seconds.
Example 2 took 0.0125901699066 seconds.

When array_size is 500:

Example 1 took 0.142632007599 seconds.
Example 2 took 0.0283041000366 seconds.

When array_size is 5,000:

Example 1 took 16.2110910416 seconds.
Example 2 took 0.170011043549 seconds.

When array_size is 50,000 (number=5):

Example 1 took 33.0327301025 seconds.
Example 2 took 0.0996031761169 seconds.

Note I had to change which axis for np.any() so the results would be the same. Reverse order of np.in1d or switch axis of np.any for desired effect. You can take reshape out of example 1, but reshape is really quite fast. Switch to obtain the desired effect. Really interesting- I will have to use this in the future.

Upvotes: 2

tom10
tom10

Reputation: 69172

Are you trying to get the values of a where b==c?

If so, you can just do a[b==c]:

from numpy import *

a = arange(11)
b = 11*a
c = b[::-1]

print a        # [  0   1   2   3   4   5   6   7   8   9  10]
print b        # [  0  11  22  33  44  55  66  77  88  99 110]
print c        # [110  99  88  77  66  55  44  33  22  11   0]
print a[b==c]  # [5]

Upvotes: 4

Related Questions