Edward h
Edward h

Reputation: 21

Octave -inf and NaN

I searched the forum and found this thread, but it does not cover my question Two ways around -inf

From a Machine Learning class, week 3, I am getting -inf when using log(0), which later turns into an NaN. The NaN results in no answer being given in a sum formula, so no scalar for J (a cost function which is the result of matrix math).

Here is a test of my function

>> sigmoid([-100;0;100])
ans =
3.7201e-44
5.0000e-01
1.0000e+00

This is as expected. but the hypothesis requires ans = 1-sigmoid

>> 1-ans
ans =
1.00000
0.50000
0.00000

and the Log(0) gives -Inf

>> log(ans)
ans =
0.00000
-0.69315
-Inf

-Inf rows do not add to the cost function, but the -Inf carries through to NaN, and I do not get a result. I cannot find any material on -Inf, but am thinking there is a problem with my sigmoid function.

Can you provide any direction?

Upvotes: 2

Views: 3764

Answers (3)

Tasos Papastylianou
Tasos Papastylianou

Reputation: 22225

Adding to the answers here, I really do hope you would provide some more context to your question (in particular, what are you actually trying to do.

I will go out on a limb and guess the context, just in case this is useful. You are probably doing machine learning, and trying to define a cost function based on the negative log likelihood of a model, and then trying to differentiate it to find the point where this cost is at its minimum.

In general for a reasonable model with a useful likelihood that adheres to Cromwell's rule, you shouldn't have these problems, but, in practice it happens. And presumably in the process of trying to calculate a negative log likelihood of a zero probability you get inf, and trying to calculate a differential between two points produces inf / inf = nan.

In this case, this is an 'edge case', and generally in computer science edge cases need to be spotted as exceptional circumstances and dealt with appropriately. The reality is that you can reasonably expect that inf isn't going to be your function's minimum! Therefore, whether you remove it from the calculations, or replace it by a very large number (whether arbitrarily or via machine precision) doesn't really make a difference.

So in practice you can do either of the two things suggested by others here, or even just detect such instances and skip them from the calculation. The practical result should be the same.

Upvotes: 1

Cris Luengo
Cris Luengo

Reputation: 60514

The typical way to avoid infinity in these cases is to add eps to the operand:

log(ans + eps)

eps is a very, very small value, and won't affect the output for values of ans unless ans is zero:

>> z = [-100;0;100];
>> g = 1 ./ (1+exp(-z));
>> log(1-g + eps)
ans =
    0.0000
   -0.6931
  -36.0437

Upvotes: 2

Spoonless
Spoonless

Reputation: 571

-inf means negative infinity. Which is the correct answer because log of (0) is minus infinity by definition.

The easiest thing to do is to check your intermediate results and if the number is below some threshold (like 1e-12) then just set it to that threshold. The answers won't be perfect but they will still be pretty close.

Using the following as the sigmoid function:

function g = sigmoid(z)
g = 1 ./ (1 + e.^-z);
end

Then the following code runs with no issues. Choose the threshold value in the 'max' statement to be less than the expected noise in your measurements and then you're good to go

>> a = sigmoid([-100, 0, 100])
a =

   3.7201e-44   5.0000e-01   1.0000e+00

>> b = 1-a
b =

   1.00000   0.50000   0.00000

>> c = max(b, 1e-12)
c =

   1.0000e+00   5.0000e-01   1.0000e-12

>> d = log(c)
d =

    0.00000   -0.69315  -27.63102

Upvotes: 0

Related Questions