Ian Murray
Ian Murray

Reputation: 87

Keras Activation Functions Tanh Vs Sigmoid

I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.

This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.

Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?

Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 %

Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 %

All the rest of the code is the exact same.

Thanks.

Upvotes: 0

Views: 594

Answers (2)

Ankit Mishra
Ankit Mishra

Reputation: 21

In the interval of (0, 1] if gradient is diminishing over time t, Then sigmoid gives better result. If gradient is increasing then tanh activation function.

Upvotes: 2

Ari K
Ari K

Reputation: 434

Convergence is usually faster if the average of each input variable over the training set is close to zero. And tanh has a zero mean. It’s likely your data is normalized and has a mean near zero?

Ref: https://medium.com/analytics-vidhya/activation-functions-why-tanh-outperforms-logistic-sigmoid-3f26469ac0d1

Upvotes: 1

Related Questions