Is there any benefit to having linear activation functions at the last layer vs an activation function like tanh?

Question

I understand this decision depends on the task, but let me explain.

I'm designing a model that predicts steering angles from a given dashboard video frame using a convolutional neural network with dense layers at end. In my final dense layer, I only have a single unit that predicts a steering angle.

My question here is, for my task would either option below show a boost in performance?

Get ground truth steering angles, convert to radians, and squash them using tanh so they are between -1 and 1. In the final dense layer of my network, use a tanh activation function.
Get ground truth steering angles. These raw angles are between -420 and 420 degrees. In the final layer, use a linear activation.

I'm trying to think about it logically, where in option A the loss will likely be much smaller since the network is dealing with much smaller numbers. This would lead to smaller changes in weights.

Let me know your thoughts!

Is there any benefit to having linear activation functions at the last layer vs an activation function like tanh?

Answers (1)

Related Questions