Reputation: 95
I've noticed tremendous training model speed degradation when I specify steps_per_epoch
argument in model.fit(..)
method. When I specify steps_per_epoch
as None (or don't use it) epoch's ETA is 2 seconds straight:
9120/60000 [===>..........................] - ETA: 2s - loss: 0.7055 - acc: 0.7535
When I add steps_per_epoch
argument, then ETA bumps up to 5 hours and training speed becomes extremely slow:
5/60000 [..............................] - ETA: 5:50:00 - loss: 1.9749 - acc: 0.3437
Here is the reproducible script:
import tensorflow as tf
from tensorflow import keras
import time
print(tf.__version__)
def get_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
model = get_model()
# Very quick - 2 seconds
start = time.time()
model.fit(train_images, train_labels, epochs=1)
end = time.time()
print("{} seconds", end - start)
model = get_model()
# Very slow - 5 hours
start = time.time()
model.fit(train_images, train_labels, epochs=1, steps_per_epoch=len(train_images))
end = time.time()
print("{} seconds", end - start)
I've also tried with pure Keras and the problem persisted. I use 1.12.0
version of Tensorflow, python 3 and Ubuntu 18.04.1 LTS.
Why does steps_per_epoch
argument cause such a significant speed degradation and how can I avoid this?
Thanks!
Upvotes: 3
Views: 12898
Reputation: 86600
Notice you're using fit
with an array of data. You're not using fit_generator
or using any generator.
There is no point in using steps_per_epoch
unless you are having unconventional ideas.
The default batch size in fit
is 32, this means you're training with 60000 // 32 = 1875
steps per epoch.
If you use this number 1875, you're going to train the same number of batches as the default None
. If you use 60000
steps, you're multiplying one epoch by 32. (By the huge difference in your speed, I would say the default batch size is also changed in this case)
The total number shown in the output for fitting without steps is the total number of images. Notice how the number of completed items grows in multiples of 32.
The total number shown when you use steps is the number of steps. Notice how the number of completed steps grow 1 by 1.
Upvotes: 5