Reputation: 149
Understanding until now- An activation function is applied on the neuron.What goes inside the function is the sum of each(connected-neuron-value*connected-weights).A single value enters the function,single value is returned from it. The above understanding works fine with tanh and sigmoid .
Now I know how softmax works and it sums the values and everything other related to it.What confuses me is that softmax takes an array of numbers, I start questioning what are the sources of these numbers which forms the array ?
The following picture gives more insight into the question
Upvotes: 3
Views: 2411
Reputation: 6103
Softmax works on an entire layer of neurons, and must have all their values to compute each of their outputs.
The softmax function looks like softmax_i(v) = exp(v_i)/sum_j(exp(v_j))
, where v would be your neuron values (in your image, [0.82, 1.21, 0.74]
), and exp
is just exp(x) = e^x
. Thus, exp(v_i)
would be [2.27, 3.35, 2.096]
. Divide each of those values by the sum of the entire vector, and you get [0.29, 0.43, 0.27]
. These are the activation outputs of your neurons.
This is useful because the values add up to 1 (forgive the rounding errors in the example above that sum to 0.99... you get the idea), and thus can be interpreted as probabilities, e.g., the probability that an image is one particular class (when it can only belong to one class). That's why the computation needs to know the values of the entire vector of neurons, and can't be computed if you only know the value of a single neuron.
Note that, because of this, you don't usually have another layer after the softmax. Usually, the softmax is applied as the activation on your output layer, not a middle layer like you show. That said, it's perfectly valid to build a network the way you show, you'll just have another weight layer going to your single output neuron, and you'll have no more guarantee about what that output value might be. A more typical architecture would be something 2 neurons -> 3 neurons (sigmoid) -> 4 neurons (softmax)
and now you'll have the probability that your input value falls into one of four classes.
Upvotes: 6