Can I use step function as loss function to train Neural Network?

Question

As title,

I try to build model for predict PM2.5,

It's possible use loss function with gradient decent , such as mse,rmse,mae...etc.

but when I use custom loss function with step function, I seems weights didn't update.

in my model last layers , is output pm2.5 prediction,

and I try to use step function calculate loss

def custom_loss(y_true,y_pred):
  z_true = step_function(y_true)
  z_pred = step_function(y_pred)
  return K.abs(z_true -z_pred)

and my step function is try to transform PM2.5 to AQI Levels.

def step_function(x):
  step1 = ((K.tanh(x-15.45))+1)/2  # is means PM2.5 <15.45 return 0 >15.45 return 1 
  step2 = ((K.tanh(x-35.45))+1)/2  # is means PM2.5 <35.45 return 0 >35.45 return 1 
  return (step1+step2)  # if x(PM2.5) = 50 , will return 2

is possible when y_true and y_pred are equals 0 , and step function return 0 , can't differential so occur weights did't update?

user11530462 · Accepted Answer

As you have correctly mentioned, you have to handle Loss when its 0 else there is nothing for optimizer to minimize. Thus weights of the model also won't update. So ideal way in this case is to keep track of training loss at step level using custom training.

You will have more control with custom training. If you want lower-level over your training and evaluation loops than what fit() and evaluate() provide, you should write your own training loop. It's actually pretty simple. But you should be ready to have a lot more debugging to do on your own.

Calling a model inside a GradientTape scope enables you to retrieve the gradients of the trainable weights of the layer with respect to a loss value. Using an optimizer instance, you can use these gradients to update these variables (which you can retrieve using model.trainable_weights).

TensorFlow provides the tf.GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a tf.GradientTape onto a "tape". Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.

If you want to process the gradients before applying them you can instead use the optimizer in three steps:

Compute the gradients with tf.GradientTape.
Process the gradients as you wish.
Apply the processed gradients with apply_gradients().

Here is a simple example for mnist data. The comments are present in the code to explain better.

Code-

import tensorflow as tf
print(tf.__version__)
from tensorflow import keras
from tensorflow.keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

epochs = 3
for epoch in range(epochs):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train, training=True)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

Output -

2.2.0
Start of epoch 0
Training loss (for one batch) at step 0: 2.323657512664795
Seen so far: 64 samples
Training loss (for one batch) at step 200: 2.3156163692474365
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 2.2302279472351074
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 2.131979465484619
Seen so far: 38464 samples
Start of epoch 1
Training loss (for one batch) at step 0: 2.00234317779541
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.7992427349090576
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.8583933115005493
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.6005337238311768
Seen so far: 38464 samples
Start of epoch 2
Training loss (for one batch) at step 0: 1.6701987981796265
Seen so far: 64 samples
Training loss (for one batch) at step 200: 1.6237502098083496
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 1.3603084087371826
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 1.246948480606079
Seen so far: 38464 samples

You can find more about tf.GradientTape here. The example used here is taken from here.

Hope this answers your question. Happy Learning.

Can I use step function as loss function to train Neural Network?

Answers (1)

Related Questions