Keras Activation Functions Tanh Vs Sigmoid

Question

I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.

This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.

Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?

Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 %

Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 %

All the rest of the code is the exact same.

Thanks.

Ari K · Accepted Answer

Convergence is usually faster if the average of each input variable over the training set is close to zero. And tanh has a zero mean. It’s likely your data is normalized and has a mean near zero?

Ref: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1

Keras Activation Functions Tanh Vs Sigmoid

Answers (2)

Related Questions