XYZ
XYZ

Reputation: 385

Populating NumPy array with most performance

I want to populate a matrix by function f() which consumes arrays a, b, c and d:

enter image description here

A nested loop is possible but I'm looking for a faster way. I tried np.fromfunction with no luck. Function f has a conditional so the solution should preferably support conditionals. Example function:

def f(a,b,c,c):
    return a+b+c+d if a==b else a*b*c*d

How np.fromfunction failed:

>>> a = np.array([1,2,3,4,5])
>>> b = np.array([10,20,30])
>>> def f(i,j): return a[i] * b[j]
>>> np.fromfunction(f, (3,5))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1853, in fromfunction
    return function(*args, **kwargs)
  File "<stdin>", line 1, in fun
IndexError: arrays used as indices must be of integer (or boolean) type

Upvotes: 0

Views: 84

Answers (2)

pho
pho

Reputation: 25489

Jake has explained why your fromfunction approach fails. However, you don't need fromfunction for your example. You could simply add an axis to b and have numpy broadcast the shapes:

a = np.array([1,2,3,4,5])
b = np.array([10,20,30])
def fun(i,j): return a[j.astype(int)] * b[i.astype(int)]

f1 = np.fromfunction(fun, (3, 5))
f2 = b[:, None] * a

(f1 == f2).all() # True

Extending this to the function you showed that contains an if condition, you could just split the if into two operations in sequence: creating an array given by the if expression, and overwriting the relevant parts by the else expression.

a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([100, 200, 300, 400, 500])

d = np.array([0, 1, 2, 3])

# Calculate the values at all indices as the product
result = d[:, None] * (a * b * c)
# array([[   0,    0,    0,    0,    0],
#        [ 500, 1600, 2700, 3200, 2500],
#        [1000, 3200, 5400, 6400, 5000],
#        [1500, 4800, 8100, 9600, 7500]])

# Calculate sum 
sum_arr = d[:, None] + (a + b + c)
# array([[106, 206, 306, 406, 506],
#        [107, 207, 307, 407, 507],
#        [108, 208, 308, 408, 508],
#        [109, 209, 309, 409, 509]])

# Set diagonal elements (i==j) to sum:
np.fill_diagonal(result, np.diag(sum_arr))

which gives the following result:

array([[ 106,    0,    0,    0,    0],
       [ 500,  207, 2700, 3200, 2500],
       [1000, 3200,  308, 6400, 5000],
       [1500, 4800, 8100,  409, 7500]])

Upvotes: 1

jakevdp
jakevdp

Reputation: 86330

The reason the function fails is that np.fromfunction passes floating-point values, which are not valid as indices. You can modify your function like this to make it work:

def fun(i,j):
  return a[j.astype(int)] * b[i.astype(int)]

print(np.fromfunction(fun, (3,5)))
[[ 10  20  30  40  50]
 [ 20  40  60  80 100]
 [ 30  60  90 120 150]]

Upvotes: 1

Related Questions