Reputation: 489
I have an array a
of numbers.
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# array([-11.36680112, -15.83791699, -39.86827287, -33.81273354,
# -19.55547753])
I want to find difference of the array with each element in the same array. The example output would be:
[array([ 0. , 4.47111586, 28.50147174, 22.44593241, 8.18867641]),
array([-4.47111586, 0. , 24.03035588, 17.97481655, 3.71756054]),
array([-28.50147174, -24.03035588, 0. , -6.05553933,
-20.31279534]),
array([-22.44593241, -17.97481655, 6.05553933, 0. ,
-14.25725601]),
array([-8.18867641, -3.71756054, 20.31279534, 14.25725601, 0. ])]
My first approach was to use a list comprehension [i - a for i in a]
. However, since my original array a
is quite huge, and I have thousands of such a
s where I need to do the same operation, the overall process becomes quite slow and memory hungry to the point where jupyter kernel dies.
Is there any possible way where I could speed up this?
Upvotes: 2
Views: 185
Reputation: 1815
There are two ways you can do, one is using only numpy vectos
a[:, None] - a
For 40000 size, this finish in 3s without parallelism and 0.6s on my 12 core machine with parallelism
import numpy as np
import numba as nb
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
# return type nb.float64[:, :]
# input argument type nb.float64[:, :]
# By specifying these you can do eager compilation instead of lazy
# also you can add parallel = True, cache=True
# if you are using python threading then nogil=True
# you can do lots of stuff
# numba has SIMD vectorization, which just means it shall not loose to numpy on performance grounds if coded properly
@nb.njit(nb.float64[:, :](nb.float64[:]))
def speed(a):
# empty to prevent unnecessary initializations
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
# nb.prange needed to tell numba this for loop can be parallelized
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
speed(a)
Performance
import numpy as np
import numba as nb
import sys
import time
@nb.njit(nb.float64[:, :](nb.float64[:]))
def f1(a):
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
@nb.njit(nb.float64[:, :](nb.float64[:]), parallel=True, cache=True)
def f2(a):
b = np.empty((a.shape[0], a.shape[0]), dtype=a.dtype)
for i in nb.prange(a.shape[0]):
for j in range(a.shape[0]):
b[i][j] = a[i] - a[j]
return b
def f3(a):
return a[:, None] - a
if __name__ == '__main__':
s0 = time.time()
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, int(sys.argv[2]))
b = eval(sys.argv[1] + '(a)')
print(time.time() - s0)
(base) xxx:~$ python test.py f1 40000
3.0324509143829346
(base) xxx:~$ python test.py f2 40000
0.6196465492248535
(base) xxx:~$ python test.py f3 40000
2.4126882553100586
I faced similar restrictions, I needed something fast. I get around 50x speed up without parallelism just by resolving memory usage and numba Why are np.hypot and np.subtract.outer very fast?
Upvotes: 2
Reputation: 2119
The easiest way is to use broadcasting:
import numpy as np
rnd = np.random.default_rng(12345)
a = rnd.uniform(0, -50, 5)
a[:, None] - a
which outputs:
array([[ 0. , 4.47111586, 28.50147174, 22.44593241,
8.18867641],
[ -4.47111586, 0. , 24.03035588, 17.97481655,
3.71756054],
[-28.50147174, -24.03035588, 0. , -6.05553933,
-20.31279534],
[-22.44593241, -17.97481655, 6.05553933, 0. ,
-14.25725601],
[ -8.18867641, -3.71756054, 20.31279534, 14.25725601,
0. ]])
Upvotes: 3