Frederic Bastiat
Frederic Bastiat

Reputation: 693

Vectorization with numpy

I am trying to create a Gaussian blurred matrix. I am modifying code from http://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html

dev_data has rows of 784 pixel features, and I would like to blur with the neighbors around the pixel in question along with the pixel itself. When we're along the outer edges (rows 1,-1, columns 1,-1), discard any out of bounds neighbors. I am not quite sure how to do this discarding.

Code:

# Initialize a new feature array with the same shape as the original data.
blurred_dev_data = np.zeros(dev_data.shape)

#we will reshape the 784 feature-long rows into 28x28 matrices
for i in range(dev_data.shape[0]):
    reshaped_dev_data = np.reshape(dev_data[i], (28,28))
    #the purpose of the reshape is to use the average of the 8 pixels + the pixel itself to blur
    for idx, pixel in enumerate(reshaped_dev_data):
        pixel = np.mean(reshaped_dev_data[idx-1:idx-1,idx-1:idx-1] + reshaped_dev_data[idx-1:idx-1,idx:idx] + reshaped_dev_data[idx-1:idx-1,idx+1:] +
             reshaped_dev_data[idx:idx,idx-1:idx-1] + reshaped_dev_data[idx:idx,idx:idx] + reshaped_dev_data[idx:idx,idx+1:] +
             reshaped_dev_data[idx+1:  ,idx-1:idx-1] + reshaped_dev_data[idx+1:  ,idx:idx] + reshaped_dev_data[idx+1:  ,idx+1:])
    blurred_dev_data[i,:] = reshaped_dev_data.ravel()

I get an error with the index:

ValueError: operands could not be broadcast together with shapes (0,0) (0,27)

It's not an indexerror, so I'm not quite sure what I'm doing wrong here/how to fix it.

Upvotes: 0

Views: 720

Answers (1)

Mateen Ulhaq
Mateen Ulhaq

Reputation: 27191

Try this:

pixel = np.mean(reshaped_dev_data[idx-1:idx+1, idx-1:idx+1])

Also, read about slicing.


So I looked further at your code, and you're doing a few things wrong:

  • This is not a Gaussian kernel.
  • Recomputing reshaped_dev_data multiple times in a loop.
  • Looping over the wrong things.
  • Trying to mutate pixel on line 9. This is bad because:
    • Mutation of the object you are looping over is generally bad
    • That will not mutate anyways! pixel is like a "value" holder. Changing it does not change the array you are looping over.
  • Not writing vectorized code!

Here's a naive, non-vectorized way of doing it:

def conv(arr, i, j):
    return np.mean(arr[i-1:i+1, j-1:j+1])

blurred_dev_data = np.zeros_like(dev_data)
reshaped_dev_data = dev_data.reshape(28, 28)

for i, row in enumerate(reshaped_dev_data):
    for j, pixel in enumerate(row):
        blurred_dev_data[i, j] = conv(reshaped_dev_data, i, j)

Notice that we're doing a convolution. So we could simply just use the built in libraries to perform a convolution on the averaging kernel.


Regarding your comments,

def conv(arr, i, j):
    # Ensure the boundaries are not exceeded
    a = max(i-1, 0)
    b = min(i+1, 28)
    c = min(j-1, 0)
    d = max(i+1, 28)

    return np.mean(arr[a:b, c:d])

blurred_dev_data = np.zeros_like(dev_data)

for n, data in enumerate(dev_data):
    reshaped = data.reshape(28, 28)
    blurred = np.zeros_like(reshaped)

    for i, row in enumerate(reshaped):
        for j, pixel in enumerate(row):
            blurred[i, j] = conv(reshaped, i, j)

    blurred_dev_data[n] = blurred.ravel()

Notice I modified conv because I'd forgotten to ensure boundaries were not exceeded.

Note: It is much, much faster to use existing libraries such as SciPy or OpenCV to perform the 2D convolution, or in this case, a mean averaging filter.

Upvotes: 1

Related Questions