Reputation: 31206
I have the following parameters defined for doing a max pool over the depth of the image (rgb) for compression before the dense layer and readout...and I am failing with an error that I cannot pool over depth and everything else:
sunset_poolmax_1x1x3_div_2x2x3_params = \
{'pool_function':tf.nn.max_pool,
'ksize':[1,1,1,3],
'strides':[1,1,1,3],
'padding': 'SAME'}
I changed the strides to [1,1,1,3]
so that depth is the only dimension reduced by the pool...but it still doesn't work. I can't get good results with the tiny image I have to compress everything to in order to keep the colors...
Actual Error:
ValueError: Current implementation does not support pooling in the batch and depth dimensions.
Upvotes: 9
Views: 8811
Reputation: 48436
You can use a custom Keras layer:
class DepthPool(tf.keras.layers.Layer):
def __init__(self, pool_size=2, **kwargs):
super().__init__(**kwargs)
self.pool_size = pool_size
def call(self, inputs):
old_shape = tf.shape(inputs)
num_channels = old_shape[-1]
num_channel_groups = num_channels // self.pool_size
new_shape = tf.concat(
[old_shape[:-1], [num_channel_groups, self.pool_size]], axis=0)
reshaped_inputs = tf.reshape(inputs, new_shape)
return tf.reduce_max(reshaped_inputs, axis=-1)
Notes:
strides
argument: it is assumed to be equal to the pool sizetf.nn.max_pool()
operation supports depthwise pooling (see my other answer), but it only works on the CPU, so this custom layer is generally betterUpvotes: 0
Reputation: 89
This is an excerpt from the book Hands on Machine learning with scikit learn keras and tensorflow. Keras does not include a depthwise max pooling layer, but TensorFlow’s low-level Deep Learning API does: just use the tf.nn.max_pool() function, and specify the kernel size and strides as 4-tuples (i.e., tuples of size 4). The first three values of each should be 1: this indicates that the kernel size and stride along the batch, height, and width dimensions should be 1. The last value should be whatever kernel size and stride you want along the depth dimension—for example, 3 (this must be a divisor of the input depth; it will not work if the previous layer outputs 20 feature maps, since 20 is not a multiple of 3):
output = tf.nn.max_pool(images,
ksize=(1, 1, 1, 3),
strides=(1, 1, 1, 3),
padding="valid")
If you want to include this as a layer in your Keras models, wrap it in a Lambda layer (or create a custom Keras layer):
depth_pool = keras.layers.Lambda(
lambda X: tf.nn.max_pool(X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3),
padding="valid"))
Upvotes: 0
Reputation: 48436
TensorFlow now supports depth-wise max pooling with tf.nn.max_pool()
. For example, here is how to implement it using pooling kernel size 3, stride 3 and VALID padding:
import tensorflow as tf
output = tf.nn.max_pool(images,
ksize=(1, 1, 1, 3),
strides=(1, 1, 1, 3),
padding="VALID")
You can use this in a Keras model by wrapping it in a Lambda
layer:
from tensorflow import keras
depth_pool = keras.layers.Lambda(
lambda X: tf.nn.max_pool(X,
ksize=(1, 1, 1, 3),
strides=(1, 1, 1, 3),
padding="VALID"))
model = keras.models.Sequential([
..., # other layers
depth_pool,
... # other layers
])
Alternatively, you can write a custom Keras layer:
class DepthMaxPool(keras.layers.Layer):
def __init__(self, pool_size, strides=None, padding="VALID", **kwargs):
super().__init__(**kwargs)
if strides is None:
strides = pool_size
self.pool_size = pool_size
self.strides = strides
self.padding = padding
def call(self, inputs):
return tf.nn.max_pool(inputs,
ksize=(1, 1, 1, self.pool_size),
strides=(1, 1, 1, self.pool_size),
padding=self.padding)
You can then use it like any other layer:
model = keras.models.Sequential([
..., # other layers
DepthMaxPool(3),
... # other layers
])
Upvotes: 2
Reputation: 88
Here is a brief example to the original question for tensorflow. I tested it on a stock RGB image of size 225 x 225
with 3 channels.
Import the standard libraries, enable eager_execution
to quickly view results
import tensorflow as tf
from scipy.misc import imread
import matplotlib.pyplot as plt
import numpy as np
tf.enable_eager_execution()
Read image and cast from uint8
to tf.float32
x = tf.cast(imread('tiger.jpeg'), tf.float32)
x = tf.reshape(x, shape=[-1, x.shape[0], x.shape[1], x.shape[2]])
print(x.shape)
input_channels = x.shape[3]
Create the filter for depthwise convolution
filters = tf.contrib.eager.Variable(tf.random_normal(shape=[3, 3, input_channels, 4]))
print(x.shape)
Perform depthwise convolution with channel multiplier
4. Note the the padding has been kept to 'SAME'
. It can be changed at will.
x = tf.nn.depthwise_conv2d(input=x, filter=filters, strides=[1, 1, 1, 1], padding='SAME', name='conv_1')
print(x.shape)
Perform the max_pooling2d
. Since the output of the pooling layer is (input_size - pool_size + 2 * padding)/stride + 1
and the padding is 'valid'
, we should get an output of (225 - 2 + 0)/1 + 1 = 223
.
x = tf.layers.max_pooling2d(inputs=x, pool_size=2, strides=1,padding='valid', name='maxpool1')
print(x.shape)
Plot the figures to confirm.
fig, ax = plt.subplots(nrows=4, ncols=3)
q = 0
for ii in range(4):
for jj in range(3):
ax[ii, jj].imshow(np.squeeze(x[:,:,:,q]))
ax[ii,jj].set_axis_off()
q += 1
plt.tight_layout()
plt.show()
Upvotes: 1
Reputation: 1469
tf.nn.max_pool does not support pooling over the depth dimension which is why you get an error.
You can use a max reduction instead to achieve what you're looking for:
tf.reduce_max(input_tensor, reduction_indices=[3], keep_dims=True)
The keep_dims
parameter above ensures that the rank of the tensor is preserved. This ensures that the behavior of the max reduction will be consistent with what the tf.nn.max_pool operation would do if it supported pooling over the depth dimension.
Upvotes: 11