Slayre
Slayre

Reputation: 25

How to perform max pooling operation over 3D convolution array?

I'm building a convolutional neural network with numpy, and I'm not sure that my pooling treatment of the 3D (HxWxD) input image is correct.

As an example, I have an image shaped (12x12x3) I convolve it to (6x6x3), and I want to perform max pooling such that I obtain a (3x3x3) image. To do this, I choose a filter size of (2x2) and a stride of 2.

output_size = int((conv.shape[0]-F)/S + 1)
pool = np.zeros((output_size,output_volume,3)) # pool array
for k in range(conv.shape[-1]): # loop over conv depth
    i_stride = 0 
    for i in range(output_size): 
        j_stride = 0
        for j in range(output_size):
            pool[i,j,k] = np.amax(conv[i_stride:i_stride+F,
                                                j_stride:j_stride+F,k],0)
            j_stride+=S 
        i_stride+=S

For the first channel of my convolution array conv[:,:,0] I obtain the following. Comparing this with the first channel of the max pooling array pool[:,:,0] I get. At a glance I can tell that the pooling operation is not correct, conv[0:2,0:2,0] (mostly gray) is most definitely not pool[0,0,0] (black), you'd expect it to be one of the shades of gray. So, I'm convinced that something is definitely wrong here. Either my for loop or the two comparisons I'm making are off.

If anyone can help me better understand the pooling operation over the array with 3 dimensions, that will definitely help.

Upvotes: 1

Views: 2129

Answers (1)

Display name
Display name

Reputation: 658

Maximum pooling produces the same depth as it's input. With that in mind we can focus on a single slice (along depth) of the input conv. For a single slice at an arbitrary index, you have a simple image of NxN dimensions. You defined your filter size 2, and stride 2. Max pooling does nothing more than iterate over the input image and get the maximum over the current "subimage".

import numpy as np

F = 2
S = 2
conv = np.array(
    [
        [
            [[.5, .1], [.1, .0], [.2, .7], [.1, .3], [.0, .1], [.3, .8]],
            [[.0, .9], [.5, .7], [.3, .1], [.9, .2], [.8, .7], [.1, .9]],
            [[.1, .8], [.1, .2], [.6, .2], [.0, .3], [.1, .3], [.0, .8]],
            [[.0, .6], [.6, .4], [.2, .8], [.6, .8], [.9, .1], [.3, .1]],
            [[.3, .9], [.7, .6], [.7, .6], [.5, .4], [.7, .2], [.8, .1]],
            [[.1, .8], [.9, .3], [.2, .7], [.8, .4], [.0, .5], [.8, .0]]
        ],
        [
            [[.1, .2], [.1, .0], [.5, .3], [.0, .4], [.0, .5], [.0, .6]],
            [[.3, .6], [.6, .4], [.1, .2], [.6, .2], [.2, .3], [.2, .4]],
            [[.2, .1], [.4, .2], [.0, .4], [.5, .6], [.7, .6], [.7, .2]],
            [[.0, .7], [.5, .3], [.4, .0], [.4, .6], [.2, .2], [.2, .7]],
            [[.0, .5], [.3, .0], [.3, .8], [.3, .2], [.6, .3], [.5, .2]],
            [[.6, .2], [.2, .5], [.5, .4], [.1, .0], [.2, .6], [.1, .8]]
        ]
    ])

number_of_images, image_height, image_width, image_depth = conv.shape
output_height = (image_height - F) // S + 1
output_width = (image_width - F) // S + 1

pool = np.zeros((number_of_images, output_height, output_width, image_depth))
for k in range(number_of_images):
    for i in range(output_height):
        for j in range(output_width):
            pool[k, i, j, :] = np.max(conv[k, i*S:i*S+F, j*S:j*S+F, :])

print(pool[0, :, :, 0])
[[0.9 0.9 0.9]
 [0.8 0.8 0.9]
 [0.9 0.8 0.8]]
print(pool[0, :, :, 1])
[[0.9 0.9 0.9]
 [0.8 0.8 0.9]
 [0.9 0.8 0.8]]
print(pool[1, :, :, 0])
[[0.6 0.6 0.6]
 [0.7 0.6 0.7]
 [0.6 0.8 0.8]]
print(pool[1, :, :, 1])
[[0.6 0.6 0.6]
 [0.7 0.6 0.7]
 [0.6 0.8 0.8]]

It's not clear to me why you're using transpose of the max row for a single element in the pool.

Upvotes: 1

Related Questions