Ahmad
Ahmad

Reputation: 11

Multiplying the loss function of a Keras model by some constant C, and also dividing its learning rate by C

Is it true that "In Keras, if you multiply the loss function of a model by some constant C, and also divide the learning rate by C, no difference in the training process will be occurred" ?

I have a model implemented by Keras. I define a loss function as:

def my_loss(y_true, y_est): 
     return something

In the first scenario I use an Adam optimizer with learning rate equal to 0.005, and I compile the model with that loss function and optimizer. I fit the model on a set of training data and I observe that its loss falls down from 0.2 to 0.001 in less than 100 epochs.

In the second scenario I change the loss function to:

def my_loss(y_true, y_est):
    return 1000 * something

and the learning rate of the optimizer to 0.000005 . Then I compile the model with the new loss function and optimizer, and see what happens to its loss function.
In my understanding, since the gradient of the new loss is 1000 times of the previous gradient, and the new learning rate is 0.001 times of the previous learning rate, in the second scenario, the loss function should fall down from 200 to 1 in less than 100 epochs. But surprisingly, I observe that the loss function is stuck around 200 and almost does not decrease.

Does anyone have any justifications for that?

Upvotes: 1

Views: 1293

Answers (1)

Luke
Luke

Reputation: 11

If you try to use SGD, the result would be what you expect. However, loss scale has no effect on adam. I suggest you to understand those formulas about adam. Therefore, you just changed the learning rate of the network and the learning rate is too small for you network.

Upvotes: 1

Related Questions