Reputation: 12817
I have an image with a thick line pixels, and line under it. I wanted to
so I used this loop:
N = 1000
im = (np.random.random((N, N)) - 0.5)
xx,yy = np.where(im > 0)
for x,y in zip(xx,yy):
for i in range(xmin,xmax): # I played with the limits so they would fit my specific image
# if inner loop already broke
if im[x][y] == False:
break
for j in range(ymin,ymax): # here again
if im[x-i][y+j]:
im[x][y] = False
break
This works really good for (~95% of unwanted pixels are removed) but it is very very slow... I takes me about 1 second per image, where operations like np.where, np.argmax
takes < 0.01 sec.
How would one implement this using numpy
(I'm guessing numpy
would suit best) to speed it up?
Edit: using @numba.jit
as suggested by @jmd_dk was very helpful, but it still seems to be slower than the normal numpy
methods.
To clarify, I want to find not only the locations of the positive pixels, as provided by np.where(im > 0)
, but the locations of pixels that have positive pixels below / above them...
So, if I would have this matrix:
0 | 0 | 0 | 1 | 1 | 1 | 0
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | 1 | 0 | 1 | 1
0 | 0 | 0 | 1 | 1 | 0 | 1
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | 1 | 1 | 1 | 1
I would want to find all the '1'
pixels that have '1'
above them and remove them - getting this matrix:
0 | 0 | 0 | 1 | 1 | 1 | 0
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | * | 0 | * | *
0 | 0 | 0 | * | * | 0 | *
0 | 0 | 0 | 0 | 0 | 0 | *
0 | * | 0 | * | * | * | *
I replaced the 1
with *
so it would stick out...
Upvotes: 0
Views: 59
Reputation: 13090
This is a case where Numba really shines. Without any real work, I immediately get a speedup of ~115x (times, not percent!). I don't have your entire code, but consider this example:
import numpy as np
import numba
from time import time
@numba.jit
def fun():
# Added just to make the code run
t0 = time()
N = 1000
im = (np.random.random((N, N)) - 0.5)
xmin = ymin = 0
xmax = ymax = N
# Your code
xx,yy = np.where(im > 0)[0], np.where(im > 0)[1]
for x,y in zip(xx,yy):
for i in range(xmin,xmax):
if im[x][y] == False:
break
for j in range(ymin,ymax):
if im[x-i][y+j]:
im[x][y] = False
break
t1 = time()
print('took', t1 - t0, 's')
fun()
fun()
On my machine, I get
took 0.18608522415161133 s
took 0.0416417121887207 s
Now remove the numba.jit
decorator, and I get
took 4.783859491348267 s
took 4.796429872512817 s
The easiest way to get the Numba package is by using the Anaconda Python distribution.
You should then call the function (here fun()
) once for each image. The first time the function is called, Numba will compile it to fast code, which is why the first call is much slower than the second (though still much faster than the normal, non-Numba version).
Upvotes: 1