Reputation: 13
I'm having troubles correctly adding physics-informed losses to my training code for my neural network.
I have an encoder that takes an input curve, X(w), where w is an independent variable (not included in the encoder input), to predict parameters, y. The parameters y can be inferred from X(w) and once known can also be used to calculate X(w) given w. Since the parameters y are known, I'm taking a supervised learning approach to train my encoder like so:
Data-driven Supervised Learning
Note: train_step() is one method from a custom tf.keras.models.Model class not included here.
For simplicity, assume that X has a shape of (N, 1000), and y has a shape of (N, 2), where N is the number of input batches:
def train_step(self, X, y):
# Define object to compute loss
loss_object = tf.keras.losses.MeanSquaredError()
# Define tape for gradient calculations
with tf.GradientTape as tape:
# Get predicted encoder results
# X -> (N, 1000)
y_pred = self.encoder(X) # y, y_pred -> (N, 2)
# Calculate data-based loss
total_loss = loss_object(y, y_pred)
# Compute the gradients from the total_loss
grads = tape.gradient(total_loss, self.trainable_weights)
"""Apply gradients to optimizer, etc."""
I can theoretically recalculate X(w) from y_pred (given that w is known and constant for all inputs of X) with some known physics, and could define another class method that looks something like this:
def calc_X(w, y0, y1):
"""Known equation here"""
return X
Because I have an actual equation to reconstruct X(w) from y, I figured I could add to the total loss by comparing X_pred to the input, X, (similar to an autoencoder), then calculate the gradients like so:
def train_step(self, X, y):
# Define object to compute loss
loss_object = tf.keras.losses.MeanSquaredError()
# Define tape for gradient calculations
with tf.GradientTape as tape:
# Get predicted encoder results
# X -> (N, 1000)
y_pred = self.encoder(X) # y, y_pred -> (N, 2)
# Get reconstructed input from y_pred
w = tf.constant(np.arange(0, 1000))
y0 = y_pred.numpy()[:, 0]
y1 = y_pred.numpy()[:, 1]
X_pred = calc_X(w, y0, y1)
# Define losses
data_loss = loss_object(y, y_pred)
reconstruction_loss = loss_object(X, X_pred)
total_loss = data_loss + reconstruction_loss
# Calculate gradients
grads = tape.gradient(total_loss, self.trainable_weights)
"""Apply gradients to optimizer, etc."""
When training with my complete code, I've observed that the calculated gradients are identical between the case using purely data-driven loss and the case where the reconstruction_loss is included. The values of total_loss are different between the two cases, yet the gradients remain the same.
How do I correctly get the reconstruction_loss to contribute to the gradient calculation?
My guess is that the tape from GradientTape is not tracking any of the calculations used to get X_pred, even though y0 and y1 come from the encoder's output. Is it as simple as just calling tape.watch(y0), tape.watch(y1)?
Upvotes: 1
Views: 35