joy
joy

Reputation: 3707

Kaggle problem for dogs and cats with kernal using ANN is not working for me

I have written Artificial Neural network code to solve Keggale Dog and Cats Kernal problem but somehow during training, it shows loss=nan and bad accuracy. My code can be found at https://www.kaggle.com/dilipkumar2k6/dogs-vs-cats-with-new-kernel/notebook

Following are details on error

from tensorflow import keras
# First apply Artificial neural network (ANN)
ann = keras.Sequential([
    keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE, 3)), # Flaten 3d to 1d
    keras.layers.Dense(3000, activation='relu'), # more hidden layer gives better perf
    keras.layers.Dense(1000, activation='relu'), # more hidden layer gives better perf
    keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
    keras.layers.Dense(2, activation='sigmoid')    
])
ann.compile(optimizer='SGD', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
ann.fit(train_X, train_y, epochs=10)

Error

Epoch 1/10
438/438 [==============================] - 2s 2ms/step - loss: nan - accuracy: 5.0000e-04
Epoch 2/10
438/438 [==============================] - 1s 2ms/step - loss: nan - accuracy: 0.0000e+00

Upvotes: 0

Views: 59

Answers (1)

AloneTogether
AloneTogether

Reputation: 26708

Using a sigmoid activation function in your output layer seems a bit strange to me when using sparse_categorical_crossentropy (although it could also work). Anyway, I think you should consider changing this line:

keras.layers.Dense(2, activation='sigmoid') 

to

keras.layers.Dense(1, activation='sigmoid') 

and use tf.keras.losses.BinaryCrossentropy(). Or change your activation function to softmax and leave the rest as it is.

You should also consider redesigning your model and using at least one tf.keras.layers.Conv2D layer before flattening the data. Here is a working example:

import tensorflow_datasets as tfds
import tensorflow as tf

ds, ds_info = tfds.load('cats_vs_dogs', split='train', with_info=True)

normalization_layer = tf.keras.layers.Rescaling(1./255)

def resize_inputs(data):
  images, labels = data['image'], data['label']
  images = tf.image.resize(normalization_layer(images),[64, 64], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
  return images, labels

ds = ds.map(resize_inputs).batch(64)

ann = tf.keras.Sequential([
    tf.keras.layers.Conv2D(64, kernel_size=3, input_shape=(64, 64, 3)),
    tf.keras.layers.Flatten(), # Flaten 3d to 1d
    tf.keras.layers.Dense(200, activation='relu'), # more hidden layer gives better perf
    tf.keras.layers.Dense(100, activation='relu'), # more hidden layer gives better perf
    tf.keras.layers.Dense(50, activation='relu'), # more hidden layer gives better perf
    tf.keras.layers.Dense(1, activation='sigmoid')    
])
ann.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
ann.fit(ds, epochs=10)
Epoch 1/10
364/364 [==============================] - 58s 140ms/step - loss: 0.8692 - accuracy: 0.5902
Epoch 2/10
364/364 [==============================] - 51s 141ms/step - loss: 0.6155 - accuracy: 0.6559
Epoch 3/10
364/364 [==============================] - 51s 141ms/step - loss: 0.5708 - accuracy: 0.7009
Epoch 4/10
364/364 [==============================] - 51s 140ms/step - loss: 0.5447 - accuracy: 0.7262
...

You can experiment with this example and find out which combination of activation function, loss function, and number of output nodes works best for you.

Upvotes: 2

Related Questions