How to use tf.gradients within a model and still use a custom training loop?

Question

I would like to make a TensorFlow model where the outputs respect a mathematical condition, namely that output 0 is a scalar function and all subsequent outputs are its partial derivatives w.r.t. the input. This is because my observations are the scalar function and its partials, and not using the partials for training would be a waste of information.

For now, using simply tf.gradients works if I don't build a custom training loop, i.e. when I don't utilize eager execution. The model is built like this, and training works as expected:

import tensorflow as tf


from tensorflow.keras import losses
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks

# Creating a model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Dropout,
    Flatten,
    Concatenate,
    Input,
    Lambda,
)

# Custom activation function
from tensorflow.keras.layers import Activation
from tensorflow.keras import backend as K

import numpy
import matplotlib.pyplot as plt

import tensorboard

layer_width = 200
dense_layer_number = 3

def lambda_gradient(args):
    layer = args[0]
    inputs = args[1]
    return tf.gradients(layer, inputs)[0]

# Input is a 2 dimensional vector
inputs = tf.keras.Input(shape=(2,), name="coordinate_input")

# Build `dense_layer_number` times a dense layers of width `layer_width`
stream = inputs
for i in range(dense_layer_number):
    stream = Dense(
        layer_width, activation="relu", name=f"dense_layer_{i}"
    )(stream)

# Build one dense layer that reduces the 200 nodes to a scalar output
scalar = Dense(1, name="network_to_scalar", activation=custom_activation)(stream)

# Take the gradient of the scalar w.r.t. the model input
gradient = Lambda(lambda_gradient, name="gradient_layer")([scalar, inputs])

# Combine them to form the model output
concat = Concatenate(name="concat_scalar_gradient")([scalar, gradient])

# Wrap everything in a model
model = tf.keras.Model(inputs=inputs, outputs=concat)

loss = "MSE"
optimizer = "Adam"

# And compile
model.compile(loss=loss, optimizer=optimizer)

However, them problem now comes when I want to do online training (i.e. with an incremental dataset). In this case, I wouldn't compile my model at the very end. Instead, I write a loop as such (before calling model.compile):

# ... continue from previous minus model.compile

loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset and train.
for i_batch in range(number_of_batches):

    with tf.GradientTape() as tape:
        # Predict w.r.t. the inputs X
        prediction_Y = model(batches_X[i_batch])
        
        # Compare batch prediction to batch observation
        loss_value = loss_fn(batches_Y[i_batch], prediction_Y)

    gradients = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

This however gives the following exception at prediction_Y = model(batches_X[i_batch]):

RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.

As most examples, tutorials and documentation solely deal with using gradients to do training, and not within the model, I can't find any good resources how to deal with this. I tried to find how to use gradient tape, but I can't figure out how to use it in the model design phase. Any pointers would be appreciated!

Versions used:

$ python --version                                         
Python 3.8.5
$ python -c "import tensorflow as tf;print(tf.__version__);print(tf.keras.__version__)"
2.2.0
2.3.0-tf

How to use tf.gradients within a model and still use a custom training loop?

Answers (1)

Related Questions