Reputation: 45
I am going through the GitHub source code for Softmax activation function. I have a few questions regarding the code.
m = x.max(1)[:, np.newaxis]
has been used to find the maximum in the array provided. What is the need for np.newaxis
in this expression?u = np.exp(x - m)
has been used but to my knowledge, it should have been u = np.exp(x)
. What implementational detail am I missing?z = u.sum(1)[:, np.newaxis]
. Similar to earlier this code also uses np.newaxis
. What is its use here?For your better understanding here is the link to the github repo where this function is defined.
Upvotes: 1
Views: 106
Reputation: 724
The function under discussion is as follows:
def softmax(x):
assert len(x.shape) == 2
m = x.max(1)[:, np.newaxis]
u = np.exp(x - m)
z = u.sum(1)[:, np.newaxis]
return u / z
As the assert
statement suggets, the softmax function must be applied to a 2D array; and so that all rows of the result u / z
sum to one. That's why the methods max
and sum
are applied row-wise, i.e. with parameter axis=1
.
np.newaxis
For each row x[i]
of x
, we want to compute np.exp(x[i]) / np.sum(np.exp(x[i]))
. Here the normalization term np.sum(np.exp(x[i]))
is a number, while the term np.exp(x[i])
is a 1D array. Thanks to numpy's broadcasting rules, the operation can be performed.
Now, iterating on the rows of x
can be avoided thanks to numpy. Let's take as an example the following array for np.exp(x)
.
u = np.array([[ 9, 6, 13, 19, 8],
[ 2, 17, 18, 0, 13],
[ 8, 3, 2, 18, 10]]) # np.exp(x)
u.sum(axis=1) # normalization term: array([55, 50, 41])
The aim is to divide each row of u
by the corresponding value of the normalization term u.sum(axis=1)
. However, broadcasting rules do not allow the two terms to be divided directly, since u
has shape (3, 5)
, while the normalization array has shape (3,)
. As numpy's documentation indicates :
Two dimensions are compatible when
- they are equal, or
- one of them is 1
So u
can be divided by arrays of shape (3, 5)
, (1, 5)
, (3, 1)
, (5,)
or ()
but not by u.sum(1)
of shape (3,)
.
That's why the index operator newaxis
is used to insert a new axis into the normalization term, making it two-dimensional with shape (3, 1)
.
u.sum(axis=1)[:, np.newaxis] # array([[55], [50], [41]])
Finally a softmax function on rows would be
def softmax(x):
assert x.dim == 2
u = np.exp(x)
z = u.sum(axis=1)[:, np.newaxis]
return u / z
However applying this function on large values can be numerically unstable, since np.exp(x)
may be very large. Note that subtracting of adding any constant will not change the result thanks to the normalization term.
That's why the maxima of each row m
are subtracted, so that all values are below zero before applying the exponential function.
Upvotes: 4