Ronny
Ronny

Reputation: 31

Working with gradients in tensorflow and keras

I am working on an AI NLP model that classifies text in 6 categories. I can't disclose the specific purpose of the model, sorry. Here it is: -

Input = keras.layers.Input((1368, ))
embedding = keras.layers.Embedding(1386, 50)(Input)

# The model will have two branches, in one we will be using 1D convents, on the other we will be using RNNs.

# The convent part
convent_1 = keras.layers.Conv1D(128, 10, activation = "relu")(embedding)
convent_2 = keras.layers.Conv1D(64, 7, activation = "relu")(convent_1)
convent_3 = keras.layers.Conv1D(32, 5, activation = "relu")(convent_2)
maxpool_1 = keras.layers.MaxPool1D(2)(convent_3)
maxpool_2 = keras.layers.MaxPool1D(2)(maxpool_1)
flatten_1 = keras.layers.Flatten()(maxpool_2)
reducing_size_dense = keras.layers.Dense(128, activation = "relu")(flatten_1) # To match the size of the flatten layer of rnn part.

# The RNN part
rnn_1 = keras.layers.GRU(128, activation = "relu", return_sequences = True)(embedding)
rnn_2 = keras.layers.GRU(128, activation = "relu", return_sequences = False)(rnn_1)
flatten_2 = keras.layers.Flatten()(rnn_2)

# The classifier
sum_flatten = keras.layers.Add()([reducing_size_dense, flatten_2])
dense_1 = keras.layers.Dense(128, activation = "relu")(sum_flatten)
dense_2 = keras.layers.Dense(128, activation = "relu")(dense_1)
dense_3 = keras.layers.Dense(128, activation = "relu")(dense_2)
dense_4 = keras.layers.Dense(128, activation = "relu")(dense_3)
output_layer = keras.layers.Dense(6, activation = "softmax")(dense_4)
# We are using kappa quadratic loss function. Therefore we will have to take the argmax of the output layer. 
# Defining the argmax layer

class Argmax_layer(keras.layers.Layer):

    def call(self, input): # Keras layers are called by the call method.
        return (tf.argmax(input, axis = -1) + 1) # We have to add 1 because the indices are in the range 0-5. We want them to be in the range 1-6.

argmax_layer = Argmax_layer()(output_layer)


model = keras.Model(Input, argmax_layer)
keras.utils.plot_model(model, show_shapes=True)

It is a keras model as described in the question. The problem is occuring when I implement training: -

callback_list = [keras.callbacks.EarlyStopping(monitor = "val_accuracy", patience = 10),
                 keras.callbacks.ModelCheckpoint(monitor = "val_accuracy", save_best_only = True, filepath = "scoring_model.keras"),
                 keras.callbacks.ReduceLROnPlateau(monitor = "val_accuracy", patience = 5, factor = 0.25)]

model.compile(optimizer = "rmsprop", loss = kappa_weighted_quad_loss, metrics = ["accuracy"])
history_model = model.fit(training_data, training_targets, validation_data = (validation_data, validation_targets), epochs = 100, callbacks = callback_list, batch_size = 20)
plot = dm.plotter(history = history_model.history)
plot.plot_loss()
plot.plot_acc()

Note that dm.plot_acc() and dm.plot_loss() are methods from my personal that helps me to preprocess data and manage AI models. I usually begin with these naive methods and then gradually tend to tensorboard. You can see that I have implemented a custom loss function. Here is the loss function: -


# The evaluation of the result uses the weighted kappa, thus we will be defining that loss function.
def kappa_weighted_quad_loss(targets, preds):

    targets = targets - 1 # Because the targets and preds will be int he range 1-6, we will reduce 1 from each element and bring it to the range 0-5
    preds = preds - 1

    targets = tf.cast(targets, tf.int16)
    preds = tf.cast(preds, tf.int16)

    confusion_matrix = np.zeros(shape = (6, 6))
    for i, j in zip(targets, preds):
        confusion_matrix[i, j] += 1

    w_i_j = lambda i, j: (i - j)**2/25 # There are 6 categories so N = 6 and N - 1 = 5
    O_i_dot = lambda i: np.sum(confusion_matrix[i, :])
    O_dot_j = lambda j: np.sum(confusion_matrix[:, j])
    E_i_j = lambda i, j: np.outer(O_i_dot(i) , O_dot_j(j)) / np.sum(confusion_matrix)

    numerator = np.sum([w_i_j(i, j) * confusion_matrix[i, j] for i in range(6) for j in range(6)])
    denominator = np.sum([w_i_j(i, j) * E_i_j(i , j) for i in range(6) for j in range(6)])
    return 1 - numerator/denominator

It is the Cohen's weighted kappa loss for multi-class classification. What I am facing is this error while working on a GPU: -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[41], line 6
      1 callback_list = [keras.callbacks.EarlyStopping(monitor = "val_accuracy", patience = 10),
      2                  keras.callbacks.ModelCheckpoint(monitor = "val_accuracy", save_best_only = True, filepath = "scoring_model.keras"),
      3                  keras.callbacks.ReduceLROnPlateau(monitor = "val_accuracy", patience = 5, factor = 0.25)]
      5 model.compile(optimizer = "rmsprop", loss = kappa_weighted_quad_loss, metrics = ["accuracy"])
----> 6 history_model = model.fit(training_data, training_targets, validation_data = (validation_data, validation_targets), epochs = 100, callbacks = callback_list, batch_size = 20)
      7 plot = dm.plotter(history = history_model.history)
      8 plot.plot_loss()

File /opt/conda/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    119     filtered_tb = _process_traceback_frames(e.__traceback__)
    120     # To get the full stack trace, call:
    121     # `keras.config.disable_traceback_filtering()`
--> 122     raise e.with_traceback(filtered_tb) from None
    123 finally:
    124     del filtered_tb

File /opt/conda/lib/python3.10/site-packages/keras/src/optimizers/base_optimizer.py:662, in BaseOptimizer._filter_empty_gradients(self, grads, vars)
    659         missing_grad_vars.append(v.name)
    661 if not filtered_grads:
--> 662     raise ValueError("No gradients provided for any variable.")
    663 if missing_grad_vars:
    664     warnings.warn(
    665         "Gradients do not exist for variables "
    666         f"{list(reversed(missing_grad_vars))} when minimizing the loss."
    667         " If using `model.compile()`, did you forget to provide a "
    668         "`loss` argument?"
    669     )

ValueError: No gradients provided for any variable.

Where I can't guess prorperly the reason of this error, but it seems that tensorflow is unable to calculate the gradients with tf.GradientTape and other gradient measuring techniques.

I think it is because of the discontinues Argmax_layer but I am not confident with this explanation. Is there a way to calculate the gradients manually or anything else?

Upvotes: 0

Views: 62

Answers (1)

charlesnadeau
charlesnadeau

Reputation: 26

Try this as your loss ported to TensorFlow. I couldn't test it by reproducing your case but at worse it will point you along the right direction:

def kappa_weighted_quad_loss(targets, preds):

targets = targets - 1 # Because the targets and preds will be int he range 1-6, we will reduce 1 from each element and bring it to the range 0-5
preds = preds - 1

targets = tf.cast(targets, tf.int16)
preds = tf.cast(preds, tf.int16)

confusion_matrix = tf.math.confusion_matrix(targets, preds,6)

w_i_j = lambda i, j: (i - j)**2/25 # There are 6 categories so N = 6 and N - 1 = 5
O_i_dot = tf.math.reduce_sum(confusion_matrix,axis=0)
O_dot_j = tf.math.reduce_sum(confusion_matrix,axis=1)
E_i_j = lambda i, j: tf.experimental.numpy.outer(O_i_dot(i) , O_dot_j(j)) / tf.math.reduce_sum(confusion_matrix)

numerator = tf.math.reduce_sum([w_i_j(i, j) * confusion_matrix[i, j] for i in range(6) for j in range(6)])
denominator = tf.math.reduce_sum([w_i_j(i, j) * E_i_j(i , j) for i in range(6) for j in range(6)])
return 1 - numerator/denominator

Upvotes: 0

Related Questions