Daan
Daan

Reputation: 357

keras loss is nan but accuracy well defined

I'm training unet using Keras in python with tensorflow backend. After one or two training steps (with batch size 1) my loss turns nan. I reviewed the data and verified that there were no nan values in my training data. I also defined a clipnorm in order to prevent an exploding gradient. This had no effect. Does anyone have an idea where this nan loss might originate from?

I use the follwoing code

import keras
import os
import random
import numpy as np





path = 'db/clouds_total/new/'

epochs = 280
classes = 2
files_labels = os.listdir(path +  'accepted_np' )
files_raws =  os.listdir(path + 'raw_np' )


def get_one_hot(targets, nb_classes):
   res = np.eye(nb_classes)[np.array(targets).reshape(-1)]
   return res.reshape(list(targets.shape)+[nb_classes])


def generator():
   while(True):
     files_labels = os.listdir(path +  'accepted_np' )
     files_raws =  os.listdir(path + 'raw_np' )

     samp = np.random.choice( np.arange(len(files_labels)) , replace = False, size = len(files_labels) )

     for i in samp: 
        label = np.load( path + 'accepted_np/' + files_labels[i])
        r = np.load(path + 'raw_np/' + files_raws[i])
        yield( [r, label])










#built network

input_im =keras.engine.Input( shape = [512,512,14], dtype = 'float32' )

l0 = keras.layers.convolutional.Conv2D( filters=64, kernel_size= (3,3),padding="same",     activation = 'relu' )(input_im)
l0 = keras.layers.convolutional.Conv2D( filters=64, kernel_size= (3,3),padding="same",     activation = 'relu' )(l0)

l1 = keras.layers.AvgPool2D(pool_size = (2,2))(l0)
l1 = keras.layers.convolutional.Conv2D( filters=128, kernel_size= (3,3),padding="same",     activation = 'relu' )(l1)
l1 = keras.layers.convolutional.Conv2D( filters=128, kernel_size= (3,3),padding="same",     activation = 'relu' )(l1)

l2 = keras.layers.AvgPool2D(pool_size = (2,2))(l1)
l2 = keras.layers.convolutional.Conv2D( filters=256, kernel_size= (3,3),padding="same",     activation = 'relu' )(l2)
l2 = keras.layers.convolutional.Conv2D( filters=256, kernel_size= (3,3),padding="same",     activation = 'relu' )(l2)

l3 = keras.layers.AvgPool2D(pool_size = (2,2))(l2)
l3 = keras.layers.convolutional.Conv2D( filters=512, kernel_size= (3,3),padding="same",     activation = 'relu' )(l3)
l3 = keras.layers.convolutional.Conv2D( filters=512, kernel_size= (3,3),padding="same",     activation = 'relu' )(l3)

l4 = keras.layers.AvgPool2D(pool_size = (2,2))(l3)
l4 = keras.layers.convolutional.Conv2D( filters=1024, kernel_size= (3,3),padding="same",     activation = 'relu' )(l4)
l4 = keras.layers.convolutional.Conv2D( filters=1024, kernel_size= (3,3),padding="same",     activation = 'relu' )(l4)


l3_up = keras.layers.convolutional.Conv2DTranspose(filters = 512 , kernel_size=(3,3) ,strides = (2, 2), padding="same")(l4)
l3_up = keras.layers.concatenate([l3,l3_up])
l3_up = keras.layers.convolutional.Conv2D( filters=512, kernel_size= (3,3),padding="same",     activation = 'relu' )(l3_up)
l3_up = keras.layers.convolutional.Conv2D( filters=512, kernel_size= (3,3),padding="same",     activation = 'relu' )(l3_up)

l2_up = keras.layers.convolutional.Conv2DTranspose(filters = 256 , kernel_size=(3,3) ,strides = (2, 2), padding="same")(l3_up)
l2_up = keras.layers.concatenate([l2,l2_up])
l2_up = keras.layers.convolutional.Conv2D( filters=256, kernel_size= (3,3),padding="same",     activation = 'relu' )(l2_up)
l2_up = keras.layers.convolutional.Conv2D( filters=256, kernel_size= (3,3),padding="same",     activation = 'relu' )(l2_up)

l1_up = keras.layers.convolutional.Conv2DTranspose(filters = 128 , kernel_size=(3,3) ,strides = (2, 2), padding="same")(l2_up)
l1_up = keras.layers.concatenate([l1,l1_up])
l1_up = keras.layers.convolutional.Conv2D( filters=128, kernel_size= (3,3),padding="same",     activation = 'relu' )(l1_up)
l1_up = keras.layers.convolutional.Conv2D( filters=128, kernel_size= (3,3),padding="same",     activation = 'relu' )(l1_up)

l0_up = keras.layers.convolutional.Conv2DTranspose(filters = 64 , kernel_size=(3,3) ,strides = (2, 2), padding="same")(l1_up)
l0_up = keras.layers.concatenate([l0,l0_up])
l0_up = keras.layers.convolutional.Conv2D( filters=64, kernel_size= (3,3),padding="same",     activation = 'relu' )(l0_up)
l0_up = keras.layers.convolutional.Conv2D( filters=64, kernel_size= (3,3),padding="same",     activation = 'relu' )(l0_up)

output = keras.layers.convolutional.Conv2D( filters=classes, kernel_size= (3,3),padding="same",     activation = 'relu' )(l0_up)

model = keras.models.Model(inputs = input_im, outputs = output)

opt = keras.optimizers.adam( lr= 0.0001 , decay = 0,  clipnorm = 0.5 )
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics = ["accuracy"])



#train 
for epoch in range(epochs):
    print(epoch)
    model.fit_generator(generator = generator(), steps_per_epoch = len(files_labels), epochs = 1 )
    if epoch % 20 == 0:
       name = path + 'model/model_' + str(epoch)
       model.save(name)

Upvotes: 0

Views: 1423

Answers (2)

yjcrocks
yjcrocks

Reputation: 126

I think the nan value has occurred due to 0.0 * np.log(0.0) computation inside crossentropy function.

ReLU emits 0.0 when it got values less than zero. As cross-entropy computes -p * log(p), this will result in value nan.

Sigmoid ensures that the output probability stays between 0 and 1.

Upvotes: 3

Daan
Daan

Reputation: 357

I threw in a sigmoid at the end instead of a ReLu. This seems to help. I'm not quite sure why as I thought a clipnorm would take care of the exploding gradient. Apperently the cross entropy get Nan when the input values become to large?

Upvotes: 0

Related Questions