Jassy.W
Jassy.W

Reputation: 539

Softmax function in neural network (Python)

I am learning the neural network and implement it in python. I firstly define a softmax function, I follow the solution given by this question Softmax function - python. The following is my codes:

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    s = 0
    e = np.exp(A)
    s = e / np.sum(e, axis =0)
    return s

I was given a test codes to see if the sofmax function is correct. The test_array is the test data and test_output is the correct output for softmax(test_array). The following is the test codes:

# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
                       [0.404,0.505,0.606]]) 
test_output = [[ 0.30028906,  0.33220277,  0.36750817],
               [ 0.30028906,  0.33220277,  0.36750817]]
print(np.allclose(softmax(test_array),test_output))

However according to the softmax function that I defined. Testing the data by softmax(test_array) returns

print (softmax(test_array))

[[ 0.42482427  0.42482427  0.42482427]
 [ 0.57517573  0.57517573  0.57517573]]

Could anyone indicate me what is the problem of the function softmax that I defined?

Upvotes: 3

Views: 5380

Answers (4)

daquexian
daquexian

Reputation: 179

You can print np.sum(e, axis=0) by yourself. You will see it is an array with 3 elements [ 2.60408059 2.88083353 3.18699884]. Then e / np.sum(e, axis=0) represents the 3-element array above divides every element of e(which is a 3-element array too). Apparently it is not you want.

You should change np.sum(e, axis=0) to np.sum(e, axis=1, keepdims=True), so that you will get

[[ 3.68403911]                  
 [ 4.98787384]]

instead, which is what you actually want. And you will get the right result.

And I recommand you read the rules of broadcasting in numpy. It describes how plus/subtract/multiply/divide works on two arrays with different sizes.

Upvotes: 2

grovina
grovina

Reputation: 3077

The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.

To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    e = np.exp(A)
    return e / np.sum(e, axis=1, keepdims=True)

Use keepdims to preserve shape and be able to divide e by the sum.

In your example, e evaluates to:

[[ 1.10627664  1.22384801  1.35391446]
 [ 1.49780395  1.65698552  1.83308438]]

then the sum for each example (denominator in the return line) is:

[[ 3.68403911]
 [ 4.98787384]]

The function then divides each line by its sum and gives the result you have in test_output.

As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:

e = np.exp(A - np.sum(A, axis=1, keepdims=True))

Upvotes: 3

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

Try this:

In [327]: def softmax(A):
     ...:     e = np.exp(A)
     ...:     return  e / e.sum(axis=1).reshape((-1,1))

In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906,  0.33220277,  0.36750817],
       [ 0.30028906,  0.33220277,  0.36750817]])

or better this version which will prevent overflow when large values are exponentiated:

def softmax(A):
    e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
    return  e / e.sum(axis=1).reshape((-1,1))

Upvotes: 2

Mateen Ulhaq
Mateen Ulhaq

Reputation: 27201

Perhaps this may be enlightening:

>>> np.sum(test_output, axis=1)
array([ 1.,  1.])

Notice that each row is normalized. In other words, they want you to compute softmax for each row independently.

Upvotes: 0

Related Questions