Keras .fit giving better performance than manual Tensorflow

Question

I'm new to Tensorflow and Keras. To get started, I followed the https://www.tensorflow.org/tutorials/quickstart/advanced tutorial. I'm now adapting it to train on CIFAR10 instead of MNIST dataset. I recreated this model https://keras.io/examples/cifar10_cnn/ and I'm trying to run it in my own codebase.

Logically, if the model, batch size and optimizer are all the same, then the two should perform identically, but they don't. I thought it might be that I'm making a mistake in preparing the data. So I copied the model.fit function from the keras code into my script, and it still performs better. Using .fit gives me around 75% accuracy in 25 epochs, while with the manual method it takes around 60 epochs. With .fit I also achieve slightly better max accuracy.

What I want to know is: Is .fit doing something behind the scenes that's optimizing training? What do I need to add to my code to get the same performance? Am I doing something obviously wrong?

Thanks for your time.

Main code:


import tensorflow as tf
from tensorflow import keras
import msvcrt
from Plotter import Plotter


#########################Configuration Settings#############################

BatchSize = 32
ModelName = "CifarModel"

############################################################################


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

print("x_train",x_train.shape)
print("y_train",y_train.shape)
print("x_test",x_test.shape)
print("y_test",y_test.shape)

x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)



train_ds = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train)).batch(BatchSize)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BatchSize)


loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.0001,decay=1e-6)

# Create an instance of the model
model = ModelManager.loadModel(ModelName,10)


train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')



########### Using this function I achieve better results ##################

model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])
model.fit(x_train, y_train,
              batch_size=BatchSize,
              epochs=100,
              validation_data=(x_test, y_test),
              shuffle=True,
              verbose=2)

############################################################################

########### Using the below code I achieve worse results ##################

@tf.function
def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images, training=True)
    loss = loss_object(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

  train_loss(loss)
  train_accuracy(labels, predictions)

@tf.function
def test_step(images, labels):
  predictions = model(images, training=False)
  t_loss = loss_object(labels, predictions)

  test_loss(t_loss)
  test_accuracy(labels, predictions)

epoch = 0
InterruptLoop = False
while InterruptLoop == False:
  #Shuffle training data
  train_ds.shuffle(1000)
  epoch = epoch + 1
  # Reset the metrics at the start of the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

  for images, labels in train_ds:
    train_step(images, labels)

  for test_images, test_labels in test_ds:
    test_step(test_images, test_labels)

  test_accuracy = test_accuracy.result() * 100
  train_accuracy = train_accuracy.result() * 100

  #Print update to console
  template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
  print(template.format(epoch,
                        train_loss.result(),
                        train_accuracy ,
                        test_loss.result(),
                        test_accuracy))

  # Check if keyboard pressed
  while msvcrt.kbhit():
    char = str(msvcrt.getch())
    if char == "b'q'":
      InterruptLoop = True
      print("Stopping loop")

The model:

from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPool2D
from tensorflow.keras import Model

class ModelData(Model):
  def __init__(self,NumberOfOutputs):
    super(ModelData, self).__init__()
    self.conv1 = Conv2D(32, 3, activation='relu', padding='same', input_shape=(32,32,3))
    self.conv2 = Conv2D(32, 3, activation='relu')
    self.maxpooling1 = MaxPool2D(pool_size=(2,2))
    self.dropout1 = Dropout(0.25)
    ############################
    self.conv3 = Conv2D(64,3,activation='relu',padding='same')
    self.conv4 = Conv2D(64,3,activation='relu')
    self.maxpooling2 = MaxPool2D(pool_size=(2,2))
    self.dropout2 = Dropout(0.25)
    ############################
    self.flatten = Flatten()
    self.d1 = Dense(512, activation='relu')
    self.dropout3 = Dropout(0.5)
    self.d2 = Dense(NumberOfOutputs,activation='softmax')

  def call(self, x):
    x = self.conv1(x)
    x = self.conv2(x)
    x = self.maxpooling1(x)
    x = self.dropout1(x)
    x = self.conv3(x)
    x = self.conv4(x)
    x = self.maxpooling2(x)
    x = self.dropout2(x)
    x = self.flatten(x)
    x = self.d1(x)
    x = self.dropout3(x)
    x = self.d2(x)
    return x

Eugenio Anselmino · Accepted Answer

I know this question already has an answer, but I faced the same problem and the solution seemed to be something different, that's not specified in the documentation.

I copy & paste here the answer (and the relative link) I found on GitHub, which solved the issue in my case:

The problem is caused by broadcasting in your loss function in the custom loop. Make sure that the dimensions of predictions and label is equal. At the moment (for MAE) they are [128,1] and [128]. Just make use of tf.squeeze or tf.expand_dims.

Link: https://github.com/tensorflow/tensorflow/issues/28394

Basic translation: when computing the loss, always be sure of the tensors' shapes.

Keras .fit giving better performance than manual Tensorflow

Answers (2)

Related Questions