logankilpatrick
logankilpatrick

Reputation: 14521

Flux.jl model always outputs 1.0 after adding Sigmoid activation function

My original issue was that I wanted my model to only output 0-1 so I can map back to my categorical images labels (Flux.jl restrict variables between 0 and 1). So I decided to add a sigmoid activation function as follows:

σ = sigmoid

model = Chain(
  resnet[1:end-2],
  Dense(2048, 1000),  
  Dense(1000, 256),
  Dense(256, 2, σ),        # we get 2048 features out, and we have 2 classes
);

However, now my model only outputs 1.0. Any ideas as to why or if I am using the activation function wrong?

Upvotes: 1

Views: 348

Answers (1)

Alex338207
Alex338207

Reputation: 1905

Consider to use an activation function for your hidden layers as multiple linear layers (Dense layers without a non-linear activation function) are just equivalent to a single linear layer. If you are using categories which are exclusive (dog or cat, but not both) which cover all your cases (it will always be a dog or cat and never e.g. an ostrich) then the probabilities should sum to one and a softmax should be more appropriate for the last function. The softmax function is generally used with the crossentropy loss function.

model = Chain(
  resnet[1:end-2],
  Dense(2048, 1000, σ),  
  Dense(1000, 256, σ),
  Dense(256, 2),
  softmax        
);

For better numerical stability and accuracy, it is recommended to replace crossentropy by and logitcrossentropy respectively (in which case softmax is not necessary).

Upvotes: 1

Related Questions