Abtin
Abtin

Reputation: 21

Why Tensorflow classification example is not using an activation function?

I am trying to follow the instructions provided here to train a binary classifier and use it for making predictions on new images. As far as I know, a Sigmoid activation function is usually needed at the end of a binary classifier model to limit the outputs to the range between 0 and 1, but this model doesn't have any Softmax or Sigmoid function:

model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(1)
])

When I use the model.predict() command for making predictions on new images, the model returns both positive and negative values which are not limited to any range and I don't have any idea how to interpret them.

I also tried to add a sigmoid activation function to the last Dense layer, Dense(1, activation='sigmoid', but this action drastically reduced the accuracy.

Can someone help me how to understand the output of the model?

Upvotes: 1

Views: 72

Answers (1)

Koralp Catalsakal
Koralp Catalsakal

Reputation: 1124

Default activation function for Dense layer is the linear function. If you follow the tutorial, you will observe that they compile the model using a CrossEntropy loss with from_logits = True argument. This is such that raw predictions from the Dense(1) layer are converted to the class probabilities according to logits when the loss is calculated.

If you switch the activation to sigmoid you should modify your loss function with from_logits=False accordingly, so that the loss function expects values in the range of [0,1]

The reason why from_logits=True is used in the tutorial is that it can produce more numerically stable results (According to TF)

Upvotes: 2

Related Questions