Reputation: 87
I have an LSTM that utilizes binary data, ie the labels are all 0's or 1's.
This would lead me to use a sigmoid activation function, but when I do it significantly underperforms the same model with a tanh activation function with the same data.
Why would a tanh activation function produce a better accuracy even though the data is not in the (-1,1) range needed for a tanh activation function?
Sigmoid Activation Function Accuracy: Training-Accuracy: 60.32 % Validation-Accuracy: 72.98 %
Tanh Activation Function Accuracy: Training-Accuracy: 83.41 % Validation-Accuracy: 82.82 %
All the rest of the code is the exact same.
Thanks.
Upvotes: 0
Views: 594
Reputation: 21
In the interval of (0, 1] if gradient is diminishing over time t, Then sigmoid gives better result. If gradient is increasing then tanh activation function.
Upvotes: 2
Reputation: 434
Convergence is usually faster if the average of each input variable over the training set is close to zero. And tanh has a zero mean. It’s likely your data is normalized and has a mean near zero?
Upvotes: 1