CIsForCookies
CIsForCookies

Reputation: 12817

A faster way to remove close pixels

I have an image with a thick line pixels, and line under it. I wanted to

  1. remove the bottom line
  2. thin the thick line

so I used this loop:

N = 1000
im = (np.random.random((N, N)) - 0.5)

xx,yy =  np.where(im > 0)
for x,y in zip(xx,yy):
    for i in range(xmin,xmax):  # I played with the limits so they would fit my specific image
        # if inner loop already broke
        if im[x][y] == False:
            break
        for j in range(ymin,ymax):  # here again
            if im[x-i][y+j]:
                im[x][y] = False
                break   

This works really good for (~95% of unwanted pixels are removed) but it is very very slow... I takes me about 1 second per image, where operations like np.where, np.argmax takes < 0.01 sec.

How would one implement this using numpy (I'm guessing numpy would suit best) to speed it up?

Edit: using @numba.jit as suggested by @jmd_dk was very helpful, but it still seems to be slower than the normal numpy methods.

To clarify, I want to find not only the locations of the positive pixels, as provided by np.where(im > 0), but the locations of pixels that have positive pixels below / above them...

So, if I would have this matrix:

0 | 0 | 0 | 1 | 1 | 1 | 0
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | 1 | 0 | 1 | 1
0 | 0 | 0 | 1 | 1 | 0 | 1
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | 1 | 1 | 1 | 1

I would want to find all the '1' pixels that have '1' above them and remove them - getting this matrix:

0 | 0 | 0 | 1 | 1 | 1 | 0
0 | 0 | 0 | 0 | 0 | 0 | 1
0 | 1 | 0 | * | 0 | * | *
0 | 0 | 0 | * | * | 0 | *
0 | 0 | 0 | 0 | 0 | 0 | *
0 | * | 0 | * | * | * | *

I replaced the 1 with * so it would stick out...

Upvotes: 0

Views: 59

Answers (1)

jmd_dk
jmd_dk

Reputation: 13090

This is a case where Numba really shines. Without any real work, I immediately get a speedup of ~115x (times, not percent!). I don't have your entire code, but consider this example:

import numpy as np
import numba
from time import time

@numba.jit
def fun():
    # Added just to make the code run
    t0 = time()
    N = 1000
    im = (np.random.random((N, N)) - 0.5)
    xmin = ymin = 0
    xmax = ymax = N
    # Your code
    xx,yy =  np.where(im > 0)[0], np.where(im > 0)[1]
    for x,y in zip(xx,yy):
        for i in range(xmin,xmax):
            if im[x][y] == False:
                break
            for j in range(ymin,ymax):
                if im[x-i][y+j]:
                    im[x][y] = False
                    break
    t1 = time()
    print('took', t1 - t0, 's')

fun() 
fun()

On my machine, I get

took 0.18608522415161133 s

took 0.0416417121887207 s

Now remove the numba.jit decorator, and I get

took 4.783859491348267 s

took 4.796429872512817 s

The easiest way to get the Numba package is by using the Anaconda Python distribution.

You should then call the function (here fun()) once for each image. The first time the function is called, Numba will compile it to fast code, which is why the first call is much slower than the second (though still much faster than the normal, non-Numba version).

Upvotes: 1

Related Questions