Reputation: 63
I'm new to Tensorflow and Keras. To get started, I followed the tutorial. I'm now adapting it to train on CIFAR10 instead of MNIST dataset. I recreated this model and I'm trying to run it in my own codebase.
Logically, if the model, batch size and optimizer are all the same, then the two should perform identically, but they don't. I thought it might be that I'm making a mistake in preparing the data. So I copied the function from the keras code into my script, and it still performs better. Using .fit gives me around 75% accuracy in 25 epochs, while with the manual method it takes around 60 epochs. With .fit I also achieve slightly better max accuracy.
What I want to know is: Is .fit doing something behind the scenes that's optimizing training? What do I need to add to my code to get the same performance? Am I doing something obviously wrong?
Thanks for your time.
Main code:
import tensorflow as tf
from tensorflow import keras
import msvcrt
from Plotter import Plotter
#########################Configuration Settings#############################
BatchSize = 32
ModelName = "CifarModel"
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
train_ds =
(x_train, y_train)).batch(BatchSize)
test_ds =, y_test)).batch(BatchSize)
loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.0001,decay=1e-6)
# Create an instance of the model
model = ModelManager.loadModel(ModelName,10)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')
########### Using this function I achieve better results ##################
metrics=['accuracy']), y_train,
validation_data=(x_test, y_test),
########### Using the below code I achieve worse results ##################
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy(labels, predictions)
def test_step(images, labels):
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_accuracy(labels, predictions)
epoch = 0
InterruptLoop = False
while InterruptLoop == False:
#Shuffle training data
epoch = epoch + 1
# Reset the metrics at the start of the next epoch
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
test_accuracy = test_accuracy.result() * 100
train_accuracy = train_accuracy.result() * 100
#Print update to console
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
train_accuracy ,
# Check if keyboard pressed
while msvcrt.kbhit():
char = str(msvcrt.getch())
if char == "b'q'":
InterruptLoop = True
print("Stopping loop")
The model:
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPool2D
from tensorflow.keras import Model
class ModelData(Model):
def __init__(self,NumberOfOutputs):
super(ModelData, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu', padding='same', input_shape=(32,32,3))
self.conv2 = Conv2D(32, 3, activation='relu')
self.maxpooling1 = MaxPool2D(pool_size=(2,2))
self.dropout1 = Dropout(0.25)
self.conv3 = Conv2D(64,3,activation='relu',padding='same')
self.conv4 = Conv2D(64,3,activation='relu')
self.maxpooling2 = MaxPool2D(pool_size=(2,2))
self.dropout2 = Dropout(0.25)
self.flatten = Flatten()
self.d1 = Dense(512, activation='relu')
self.dropout3 = Dropout(0.5)
self.d2 = Dense(NumberOfOutputs,activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.maxpooling1(x)
x = self.dropout1(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.maxpooling2(x)
x = self.dropout2(x)
x = self.flatten(x)
x = self.d1(x)
x = self.dropout3(x)
x = self.d2(x)
return x
Upvotes: 6
Views: 1175
Reputation: 145
I know this question already has an answer, but I faced the same problem and the solution seemed to be something different, that's not specified in the documentation.
I copy & paste here the answer (and the relative link) I found on GitHub, which solved the issue in my case:
The problem is caused by broadcasting in your loss function in the custom loop. Make sure that the dimensions of predictions and label is equal. At the moment (for MAE) they are [128,1] and [128]. Just make use of tf.squeeze or tf.expand_dims.
Basic translation: when computing the loss, always be sure of the tensors' shapes.
Upvotes: 2
Mentioning the solution here (Answer Section) even though it is present in the Comments, for the benefit of the Community.
On the same Dataset
, the Accuracy can differ when using Keras
with that of the Model
built using Tensorflow
mainly if the Data is shuffled because, when we shuffle the Data, the Split of Data between Training and Testing (or Validation) will be different resulting in different Train and Test Data in both the cases (Keras and Tensorflow).
If we want to observe the similar results on the Same Dataset and with similar Architecture in Keras
and in Tensorflow
, we can Turn off Shuffling the Data
Hope this helps. Happy Learning!
Upvotes: 1