Reputation: 14521
My original issue was that I wanted my model to only output 0-1 so I can map back to my categorical images labels (Flux.jl restrict variables between 0 and 1). So I decided to add a sigmoid activation function as follows:
σ = sigmoid
model = Chain(
resnet[1:end-2],
Dense(2048, 1000),
Dense(1000, 256),
Dense(256, 2, σ), # we get 2048 features out, and we have 2 classes
);
However, now my model only outputs 1.0. Any ideas as to why or if I am using the activation function wrong?
Upvotes: 1
Views: 348
Reputation: 1905
Consider to use an activation function for your hidden layers as multiple linear layers (Dense layers without a non-linear activation function) are just equivalent to a single linear layer. If you are using categories which are exclusive (dog or cat, but not both) which cover all your cases (it will always be a dog or cat and never e.g. an ostrich) then the probabilities should sum to one and a softmax
should be more appropriate for the last function.
The softmax
function is generally used with the crossentropy
loss function.
model = Chain(
resnet[1:end-2],
Dense(2048, 1000, σ),
Dense(1000, 256, σ),
Dense(256, 2),
softmax
);
For better numerical stability and accuracy, it is recommended to replace crossentropy
by and logitcrossentropy
respectively (in which case softmax
is not necessary).
Upvotes: 1