Reputation: 55
Currently I am training a model for binary classification. I liked the idea of having two probabilities (one for each of the existing classes) which add up to 1. So I used softmax in my output layer and have gotten very high accuracies (up to 99,5%) with also very low losses of 0,007. While researching a bit I found that binary crossentropy is the only real choice when training for a 2 dimensional classification problem.
Now I am getting confused if I have to use a classification_crossentropy as lossfunction when I want to use softmax. Could you help me to understand what should be used as loss function and activation function in a binary classification problem and why?
Heres my code:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(10, input_dim=input_dim, activation='sigmoid'))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.add(tf.keras.layers.Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Upvotes: 0
Views: 3325
Reputation: 311
So, if every object can represent only one class then there is no difference between
model.add(Dense(1, activation='sigmoid'))
loss = tf.keras.losses.BinaryCrossentropy()
and
model.add(Dense(2, activation='softmax'))
loss = tf.keras.losses.CategoricalCrossentropy()
As mentioned here, binary crossentropy is just a case of categorical crossentropy.
Upvotes: 5
Reputation: 7985
The loss function is depending on the problem type.
For a binary classification problem -> binary_crossentropy
For a multi-class problem -> categoricol_crossentropy
For a text classification problem -> MSE loss is calculated.
The activation function is also depending on the problem type.
relu
activation function is used, but for a binary classification problem sometimes tanh
performs better.I wouldn't suggest using sigmoid
For optimizer, generally, Adadelta
performs better.
The reason for the suggestion is the accuracy metric. The aim is to reach high accuracy, therefore your model must be learning. There are no strict rules, but some methods have been proven to perform better.
Upvotes: 4