Vectorize a Python function that blurs images

I have made a function using pure Python to blur whatever image I give. Now I want to use NumPy and remove all for loops.

def blur_image(src, dst):
    (h, w, c) = src.shape

    for x in range(h-1):
        for y in range(w-1):
            for z in range(c):
                if x != 0 or y != 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x-1, y-1, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) / 9
                if x == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) / 9
                if y == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y, z]
                    + src[x, y+1, z]
                    + src[x-1, y, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y, z]
                    + src[x+1, y+1, z]) / 9
    return dst

How can I make this code better with NumPy? Best case is to remove all for loops? Any tips?

EDIT: I am trying to solve this problem without any libraries to help me blur my image. I blur it using pure python, and now looking for tips so that all computationally heavy bits use numpy arrays. Dont have a lot of experience using numpy ://

Upvotes: 2

Answers (2)

max9111

Reputation: 6482

Simply compile it

I assume that you want to speed up your function. The simplest and likely also the fastest way to to that is to use a Compiler like Cython or Numba.

The problem is also embarrassingly parallel.

Example

import numpy as np
import numba as nb

@nb.njit(error_model="numpy",parallel=True)
def blur_image_nb(src, dst):
    (h, w, c) = src.shape
    for x in nb.prange(h-1):
        for y in range(w-1):
            for z in range(c):
                if x != 0 or y != 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x-1, y-1, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) / 9
                if x == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) / 9
                if y == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y, z]
                    + src[x, y+1, z]
                    + src[x-1, y, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y, z]
                    + src[x+1, y+1, z]) / 9
    return dst

Edit

Of course there are some simple optimizations possible, like the uneccessary division by a constant value. You can also multiply by a factor which is calculated only once.

#You can achieve the same with fastmath=True
#Divisions are quite costly
@nb.njit(error_model="numpy",parallel=True)
def blur_image_nb_2(src, dst):
    (h, w, c) = src.shape
    fact=1./9.
    for x in nb.prange(h-1):
        for y in range(w-1):
            for z in range(c):
                if x != 0 or y != 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x-1, y-1, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) *fact
                if x == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x, y, z]
                    + src[x+1, y, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x, y-1, z]
                    + src[x, y+1, z]
                    + src[x+1, y-1, z]
                    + src[x+1, y+1, z]) *fact
                if y == 0:
                    dst[x, y, z] = (src[x, y, z]
                    + src[x-1, y, z]
                    + src[x+1, y, z]
                    + src[x, y, z]
                    + src[x, y+1, z]
                    + src[x-1, y, z]
                    + src[x-1, y+1, z]
                    + src[x+1, y, z]
                    + src[x+1, y+1, z]) *fact
    return dst

Timings

src=np.random.rand(1024,1024,3)
dst=np.empty((1024,1024,3))

%timeit blur_image(src, dst)
8.08 s ± 52.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit blur_image_nb(src, dst)
5.19 ms ± 954 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit blur_image_nb_2(src, dst)
3.13 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Just for comparison
%timeit res=np.copy(src)
11.1 ms ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.copyto(dst, src)
2.44 ms ± 53.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

import numpy as np
from PIL import Image, ImageFilter
import scipy.signal as ss

src=(np.random.rand(1024,1024,3)*255).astype(np.uint8)
img = Image.fromarray(src.astype(np.uint8))

%timeit img.filter(ImageFilter.BoxBlur(radius=3))
16.1 ms ± 65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

kernel = np.ones((3, 3)) / 9  # The 'boxcar'.
#3x in a example where src.shape[2]==3
%timeit ss.convolve(src[:,:,0], kernel, mode='same')
76.6 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Conclusion

Normally you would simply use functions which are already available, like @kwinkunks showed in his answer, but simply compling your function shows better performance than PIL (which also has some limitation regarding the datatype), and a more general convolution.

Upvotes: 0

Matt Hall

Reputation: 8122

If I'm reading your code correctly, you are describing filtering with a 3 × 3 boxcar filter. If you care about performance, it's probably a good idea to use an existing library for this, because it will be much more optimized than anything you can easily implement.

For example, what you're doing can be achieved with convolution, which is available in scipy.signal. So you could do this for each channel, then stack the resulting arrays:

import numpy as np
import scipy.signal as ss

arr = np.random.random((100, 100))  # Some fake data.
kernel = np.ones((3, 3)) / 9  # The 'boxcar'.

ss.convolve(arr, kernel, mode='same')

Blurring is also available in PIL:

import numpy as np
from PIL import Image, ImageFilter

arr = np.random.randint(0, 256, size=((100, 100, 3)), dtype=np.uint8)
img = Image.fromarray(arr)

img.filter(ImageFilter.BoxBlur(radius=3))

There are lots of other ways to do it, including doing it in the Fourier space (one example).

One nice feature of these other approaches is that they don't hard-code the filter design into the code, so it's easy to try another kernel, like a Gaussian for example.

Upvotes: 1

Vectorize a Python function that blurs images

Answers (2)

Simply compile it

Related Questions