Reputation: 662
I am trying to create a 5x5 mean filter to remove some salt and pepper noise from an image. I read the image into a numpy array. And tried making some changes to calculate the average value of a pixel's neighbors. The result I got is quite bad and I can't seem to figure out why there are gaps in my image result.
from PIL import Image
import numpy
image1 = 'noisy.jpg'
save1 = 'filtered.jpg'
def average(path, name):
temp=Image.open(path)
image_array = numpy.array(temp)
new_image = []
for i in range(0, len(image_array)):
new_image.append([])
n = 0
average_sum = 0
for i in range(0, len(image_array)):
for j in range(0, len(image_array[i])):
for k in range(-2, 3):
for l in range(-2, 3):
if (len(image_array) > (i + k) >= 0) and (len(image_array[i]) > (j + l) >= 0):
average_sum += image_array[i+k][j+l]
n += 1
new_image[i].append(int(round(average_sum/n)))
average_sum = 0
n = 0
x = Image.fromarray(numpy.array(new_image), 'L')
x.save(name)
print("done")
average(image1, save1)
---------------------Input image-----------------
---------------------Output image-----------------
Upvotes: 2
Views: 10030
Reputation: 4510
I just want to warn anyone else who finds this page. Basically, nobody should ever do somenumpyarray[y,x]
to directly access pixel values one by one. Every time you type something like that, Numpy has to create 4 new Python objects (the tuple
object containing RGB values, and the three individual int
objects for each R/G/B value). That's because in Python, everything is an object (even numbers are objects), which means that data can't be "just read directly from Numpy". Actual Python objects have to be created instead, and the Numpy data (such as numbers) are copied into those objects, every time you try to read something from Numpy into Python.
That's 4 Python object creations per pixel you try to read from the array. For a 1080p image if you ONLY read each pixel ONCE that's 8 294 400 objects. But the code above checks a 5x5 (25 pixels) matrix around each pixel, so that's 207 360 000 object creations! Insane!
This object creation is known as boxing (taking native Numpy data and packing/boxing it in a Python object data structure). As well as unboxing (taking Python data, extracting the actual value it contains (such as a number) and packing it into the native Numpy array). Reading / writing values in a Numpy array from Python always involves boxing and unboxing, which is why it's extremely slow and you should always use native Numpy methods for operating on your data instead. Numpy is NOT a general-purpose "array" for us to treat like any random-access Python List. Numpy is meant for vector / matrix operations using its own built-in functions! In fact, you should never even do for X in some_ndarray
because iterating invokes the same slow-ass boxing process (every X
item in such a loop is extracted from Numpy and boxed).
Anyway... what you are trying to implement is a 5x5 "box blur", meaning the average of all nearby pixels within a square 5x5 radius.
You should therefore use a native C++ library which performs it all in pure, clean RAM, without involving Python at all. One such library is OpenCV, which accepts a ndarray (your image pixels), and internally reads directly from the RAM owned by the ndarray, and operates directly on each pixel natively.
Here's the code:
import cv2
path = "noisy.jpg"
img = cv2.imread(path)
img = cv2.blur(img, (5,5)) # This is now your box-blurred image.
Benchmarks in a 1920x1080 image:
Never, ever access Numpy array elements directly. It's not meant for that.
Good luck to any future readers.
Edit: By the way, to answer the original question... to remove salt and pepper noise, you should use median filters instead of box blurs.
Input:
5x5 Box Blur (aka Mean/Average Blur):
img = cv2.blur(img, (5,5))
3x3 Median Blur:
img = cv2.medianBlur(img, 3)
Upvotes: 8
Reputation: 1978
Instead of creating a list for the new image, just make a copy of the original image (or create an array with the same size as of the original image) and change its value with the new averaged pixel values. Like this
def average(path, name):
temp=Image.open(path)
image_array = numpy.array(temp)
new_image = image_array.copy()
n = 0
average_sum = 0
for i in range(0, len(image_array)):
for j in range(0, len(image_array[i])):
for k in range(-2, 3):
for l in range(-2, 3):
if (len(image_array) > (i + k) >= 0) and (len(image_array[i]) > (j + l) >= 0):
average_sum += image_array[i+k][j+l]
n += 1
new_image[i][j] = (int(round(average_sum/n)))
average_sum = 0
n = 0
x = Image.fromarray(numpy.array(new_image), 'L')
x.save(name)
print("done")
This gave me the following output:
Upvotes: 2