Reputation: 539
I am learning the neural network and implement it in python. I firstly define a softmax function, I follow the solution given by this question Softmax function - python. The following is my codes:
def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
s = 0
e = np.exp(A)
s = e / np.sum(e, axis =0)
return s
I was given a test codes to see if the sofmax
function is correct. The test_array
is the test data and test_output
is the correct output for softmax(test_array)
. The following is the test codes:
# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
[0.404,0.505,0.606]])
test_output = [[ 0.30028906, 0.33220277, 0.36750817],
[ 0.30028906, 0.33220277, 0.36750817]]
print(np.allclose(softmax(test_array),test_output))
However according to the softmax
function that I defined. Testing the data by softmax(test_array)
returns
print (softmax(test_array))
[[ 0.42482427 0.42482427 0.42482427]
[ 0.57517573 0.57517573 0.57517573]]
Could anyone indicate me what is the problem of the function softmax
that I defined?
Upvotes: 3
Views: 5380
Reputation: 179
You can print np.sum(e, axis=0)
by yourself. You will see it is an array with 3 elements [ 2.60408059 2.88083353 3.18699884]
. Then e / np.sum(e, axis=0)
represents the 3-element array above divides every element of e
(which is a 3-element array too). Apparently it is not you want.
You should change np.sum(e, axis=0)
to np.sum(e, axis=1, keepdims=True)
, so that you will get
[[ 3.68403911]
[ 4.98787384]]
instead, which is what you actually want. And you will get the right result.
And I recommand you read the rules of broadcasting in numpy. It describes how plus/subtract/multiply/divide works on two arrays with different sizes.
Upvotes: 2
Reputation: 3077
The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.
To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.
def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
e = np.exp(A)
return e / np.sum(e, axis=1, keepdims=True)
Use keepdims
to preserve shape and be able to divide e
by the sum.
In your example, e
evaluates to:
[[ 1.10627664 1.22384801 1.35391446]
[ 1.49780395 1.65698552 1.83308438]]
then the sum for each example (denominator in the return
line) is:
[[ 3.68403911]
[ 4.98787384]]
The function then divides each line by its sum and gives the result you have in test_output
.
As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:
e = np.exp(A - np.sum(A, axis=1, keepdims=True))
Upvotes: 3
Reputation: 210822
Try this:
In [327]: def softmax(A):
...: e = np.exp(A)
...: return e / e.sum(axis=1).reshape((-1,1))
In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906, 0.33220277, 0.36750817],
[ 0.30028906, 0.33220277, 0.36750817]])
or better this version which will prevent overflow when large values are exponentiated:
def softmax(A):
e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
return e / e.sum(axis=1).reshape((-1,1))
Upvotes: 2
Reputation: 27201
Perhaps this may be enlightening:
>>> np.sum(test_output, axis=1)
array([ 1., 1.])
Notice that each row is normalized. In other words, they want you to compute softmax for each row independently.
Upvotes: 0