Reputation: 3707
I have written Artificial Neural network code to solve Keggale Dog and Cats Kernal problem but somehow during training, it shows loss=nan and bad accuracy. My code can be found at https://www.kaggle.com/dilipkumar2k6/dogs-vs-cats-with-new-kernel/notebook
Following are details on error
from tensorflow import keras
# First apply Artificial neural network (ANN)
ann = keras.Sequential([
keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)), # Flaten 3d to 1d
keras.layers.Dense(3000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(1000, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
keras.layers.Dense(2, activation='sigmoid')
])
ann.compile(optimizer='SGD', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
ann.fit(train_X, train_y, epochs=10)
Error
Epoch 1/10
438/438 [==============================] - 2s 2ms/step - loss: nan - accuracy: 5.0000e-04
Epoch 2/10
438/438 [==============================] - 1s 2ms/step - loss: nan - accuracy: 0.0000e+00
Upvotes: 0
Views: 59
Reputation: 26708
Using a sigmoid
activation function in your output layer seems a bit strange to me when using sparse_categorical_crossentropy
(although it could also work). Anyway, I think you should consider changing this line:
keras.layers.Dense(2, activation='sigmoid')
to
keras.layers.Dense(1, activation='sigmoid')
and use tf.keras.losses.BinaryCrossentropy()
.
Or change your activation function to softmax
and leave the rest as it is.
You should also consider redesigning your model and using at least one tf.keras.layers.Conv2D
layer before flattening the data. Here is a working example:
import tensorflow_datasets as tfds
import tensorflow as tf
ds, ds_info = tfds.load('cats_vs_dogs', split='train', with_info=True)
normalization_layer = tf.keras.layers.Rescaling(1./255)
def resize_inputs(data):
images, labels = data['image'], data['label']
images = tf.image.resize(normalization_layer(images),[64, 64], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return images, labels
ds = ds.map(resize_inputs).batch(64)
ann = tf.keras.Sequential([
tf.keras.layers.Conv2D(64, kernel_size=3, input_shape=(64, 64, 3)),
tf.keras.layers.Flatten(), # Flaten 3d to 1d
tf.keras.layers.Dense(200, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(50, activation='relu'), # more hidden layer gives better perf
tf.keras.layers.Dense(1, activation='sigmoid')
])
ann.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
ann.fit(ds, epochs=10)
Epoch 1/10
364/364 [==============================] - 58s 140ms/step - loss: 0.8692 - accuracy: 0.5902
Epoch 2/10
364/364 [==============================] - 51s 141ms/step - loss: 0.6155 - accuracy: 0.6559
Epoch 3/10
364/364 [==============================] - 51s 141ms/step - loss: 0.5708 - accuracy: 0.7009
Epoch 4/10
364/364 [==============================] - 51s 140ms/step - loss: 0.5447 - accuracy: 0.7262
...
You can experiment with this example and find out which combination of activation function, loss function, and number of output nodes works best for you.
Upvotes: 2