Why does the global average pooling work in ResNet?

Question

Lately, I start a project about classification, using a very shallow ResNet. The model just has 10 conv. layer and then connects a Global avg pooling layer before softmax layer.

The performance is good as my expectation --- 93% (yeah, it is ok).

However, for some reasons, I need replace the Global avg pooling layer.

I have tried the following ways:

(Given the input shape of this layer [-1, 128, 1, 32], tensorflow form)

Global max pooling layer. but got 85% ACC

Exponential Moving Average. but got 12% (almost didn't work)

 split_list = tf.split(input, 128, axis=1)
 avg_pool = split_list[0]
 beta = 0.5
 for i in range(1, 128):
     avg_pool = beta*split_list[i] + (1-beta)*avg_pool
 avg_pool = tf.reshape(avg_pool, [-1,32])

Split input into 4 parts, avg_pool each parts, finally concatenate them. but got 75%

 split_shape = [32,32,32,32]
 split_list = tf.split(input, 
                       split_shape, 
                       axis=1)
 for i in range(len(split_shape)):
     split_list[i] = tf.keras.layers.GlobalMaxPooling2D()(split_list[i])
 avg_pool = tf.concat(split_list, axis=1)

Average the last channel. [-1, 128, 1, 32] --> [-1, 128], didn't work. ^
Use a conv. layer with 1 kernel. In this way, the output shape is [-1, 128, 1, 1]. but didn't work, 25% or so.

I am pretty confused why global average pooling can work that well? And is there any other way to replace it?

Why does the global average pooling work in ResNet?

Answers (1)

Related Questions