Reputation: 503
I am trying to reduce the resolution of an image to speed up training. So I used tf.nn.max_pool method to operate on my raw image. I am expecting the resultant image is a blurred one with smaller size, but actually it is not.
My raw image has shape [320, 240, 3], and it looks like:
And after max_pooling, with ksize=[1,2,2,1]
and strides=[1,2,2,1]
it becomes
produced by the following code:
# `img` is an numpy.array with shape [320, 240, 3]
# since tf.nn.max_pool only receives tensor with size
# [batch_size, height,width,channel], so I need to reshape
# the image to have a dummy dimension.
img_tensor = tf.placeholder(tf.float32, shape=[1,320,240,3])
pooled = tf.nn.max_pool(img_tensor, ksize=[1,2,2,1], strides=[1,2,2,1],padding='VALID')
pooled_img = pooled.eval(feed_dict={img_tensor: img.reshape([1,320,240,3])})
plt.imshow(np.squeeze(pooled_img, axis=0))
The pooled image has shape [160, 120, 3] which is expected. Its just the transformation behaviour is really confused me. It shouldnt have that "repeated shifting" behaviour, since there is no pixel overlapping computation.
Many thanks in advance.
Upvotes: 1
Views: 283
Reputation: 3159
I think the problem is how your image has been reshaped. This image actually has the shape of [240, 320, 3].
So try to use [1, 240, 320, 3]) instead of [1, 320, 240, 3]). It should work.
Upvotes: 1