Reputation: 2079
As I understand global average pooling should increase training speed. But for some reason it doesn't. I used Horse Or Human dataset. Here's my code:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
target_size = (160, 160)
batch_size = 100
data_generator = ImageDataGenerator(
zoom_range=0.1,
shear_range=0.1,
rotation_range=30,
brightness_range=[0.8, 1.2],
channel_shift_range=0.1,
horizontal_flip=True,
)
train_generator = data_generator.flow_from_directory(
'data/horse-or-human/train',
class_mode='binary',
target_size=target_size,
batch_size=batch_size,
)
val_generator = data_generator.flow_from_directory(
'data/horse-or-human/validation',
class_mode='binary',
target_size=target_size,
batch_size=batch_size,
)
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import *
base_model = VGG16(
include_top=False,
input_shape=(*target_size, 3)
)
for layer in base_model.layers:
layer.trainable = False
layer = Flatten()(base_model.output)
layer = Dense(512, activation='relu')(layer)
out = Dense(1, activation='sigmoid')(layer)
model = Model(inputs=base_model.input, outputs=out)
model.compile(optimizer=Adam(lr=5e-6), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(
train_generator,
epochs=150,
steps_per_epoch=len(train_generator),
validation_data=val_generator,
validation_steps=len(val_generator))
Each epoch of training takes about 14 second for this model. The number of trainable parameters is 6,554,625.
base_model = VGG16(
include_top=False,
input_shape=(*target_size, 3)
)
for layer in base_model.layers:
layer.trainable = False
layer = GlobalAvgPool2D()(base_model.output)
layer = Dense(512, activation='relu')(layer)
out = Dense(1, activation='sigmoid')(layer)
model = Model(inputs=base_model.input, outputs=out)
model.compile(optimizer=Adam(lr=5e-6), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(
train_generator,
epochs=150,
steps_per_epoch=len(train_generator),
validation_data=val_generator,
validation_steps=len(val_generator))
This model has only 263,169 trainable parameters. But time per epoch is still around 14 seconds.
I tried it with bigger target_size and other base models, but still time before and after implementing GlobalAvgPool2D are the same.
Please, explain this behavior.
Upvotes: 0
Views: 827
Reputation: 24691
It's probably due to:
caching
or prefetching
) and probably takes the most time and network run time is negligible. See Better performance with tf.data API and Analyze tf.data performance with the TF Profiler for more info.Dense
layer after GlobalAvgPool2d
or Flatten
is highly parallelized (it's basically a batched matrix multiplication) using optimized underlying C/C++ (for example OpenMP, MKL-DNN, cuda kernels and many others depending on used devices), hence larger matrix multiply isn't so severe (at least up to a point).Flatten
is faster than GlobalAvgPooling
as it's a single reshape
operation (reshape(batch_size, -1)
), hence what is lost due to larger Dense
might be gained here.Much larger input_shape
might make the difference as Dense
wouldn't fit well in the memory (either cuda or cpu cache), but I expect it's I/O bounded due to image loading.
Upvotes: 1