Dmytro
Dmytro

Reputation: 5213

How do you handle precision problems in Matlab?

I am trying to write a birthday problem calculator in Matlab, but having a precision issue where (1 - very small floating point number = 1).

My current problem is that I want to see how many attempts are needed at guessing at a UUID on a website where there are 23,000,000 active session tokens, which have 128 bits of possible unique values, so that the odds of guessing a valid token is over 50%.

I began by simulating the process by:

  1. I set my success_rate to (23,000,000 / (2^128))
  2. I set my failure_rate to (1 - success_rate)

But then I noticed that this value is 1.

Worse, entering (1 - 23000000/(2^128))^n > 0.5 into Wolfram Alpha provides no useful answers.

My first thought was to ditch Matlab completely and create my own library in Java which does not use floating point values at all, and instead stores the ratios as pairs of BigDecimal objects, which would eliminate all the precision issues by only doing computation at the very last point, and store this computation as a pair of minimum-maximum to show the result as a range in which the solution lies(where exact solution won't exist because floating point division causes errors and values which cannot be represented using floating point of the specified precision, but can represent exact answer by just specifying the actual ratio which is exact because division is never applied to it, the ratio is shown instead).

Is there a way to deal with this kind of issue without having to invent such a system, or are these kinds of problems inherently impossible to solve using floating point systems?

Upvotes: 3

Views: 896

Answers (5)

Neapolitan
Neapolitan

Reputation: 2153

As explained by others, (1 - 23000000/2^128) is too close to one to be represented in the 53 bits of mantissa in a double precision floating point value, so (1 - 230000000/2^128)^n cannot be computed.

Other software packages (python+sympy, mathematica, ...) can perform arbitrary precision calculations, and there is a multi-precision computing toolbox available for matlab. This will allow you to perform the calculation directly.

You can instead rearrange the equation into a binomial expansion:

(a + b)^n = a^n + C(1,n)a^(n-1)b + C(2,n)a^(n-2)b^2 + ...

Where C(k,n) is the number of ways of choosing k items from a pool of size n. Since b^k is tiny for larger k, ignore those terms, and approximate it as:

(1 - b)^n = 1 - n b + O(b^2)

with b = 23000000/2^38. Solving 1 - n b = 0.5 for n yields the approximation n = 2^128 / (2 * 23000000) given by others.

Herbie can sometimes help you rewrite equations to improve numerical stability.

Another favourite trick is to perform a Taylor expansion near the value you are trying to approximate, giving a polynomial that you can use over a range of inputs. The polynomial degree and the valid range can be determined using a multiprecision library so that you know that your values are accurate to machine precision throughout the range. Wolfram Alpha provides an online Taylor series calculator.

More details can be found in books such as:

  1. Higham NJ. Accuracy and Stability of Numerical Algorithms: Second Edition. SIAM; 2002.

Upvotes: 0

gariepy
gariepy

Reputation: 3674

The other answers have explained why you can't perform the calculations that you want with the disparity in magnitude of the numbers you are using. However, as I mentioned in a comment, you CAN experiment with smaller numbers to show a trend. Let's call the "projected" value size_of_key_space / (2 * number_of_keys). This is kind of a naive expected value for getting a 50% probability of success. To prove this is in the ballpark, I ran a simulation for a number of different key sets, and key spaces. All are large, with varying sparsity:

function sparse_probability()

num_keys = logspace(2, 5, 15);  % number of keys varies from 1e2 to 1e5
key_spaces = logspace(6, 12, 15);  % size of key space varies from 1e6 to 1e12
% so p_sucess varies from 1e-4 to 1e-7

num_experiments = length(num_keys);

results = zeros(1,num_experiments);
proportions = zeros(1,num_experiments);

for i = 1:num_experiments
    num_objs = num_keys(i);
    size_of_key_space = key_spaces(i);
    p_success = num_objs/size_of_key_space;
    p_fail = 1 - p_success;

    total_fail = 1;
    num_trials = 0;
    while (total_fail > 0.5)
        total_fail = total_fail * p_fail;
        num_trials = num_trials + 1;
    end


    results(i) = num_trials;
    proportions(i) = num_trials/(size_of_key_space/(2*num_objs));
    fprintf('p_success = %f, num_trials = %d, ratio = %f, num_keys = %e; size key_space = %e\n', 1 - total_fail, num_trials, proportions(i), num_objs, size_of_key_space);
end

Since the sizes of the key set and key space vary significantly, I calculate the ratio of the "projected" value above, and the actual number of trials needed to achieve 50% probability. The output of the function above is:

p_success = 0.500044, num_trials = 6932, ratio = 1.386400, num_keys = 1.000000e+02; size key_space = 1.000000e+06
p_success = 0.500010, num_trials = 11353, ratio = 1.386293, num_keys = 1.637894e+02; size key_space = 2.682696e+06
p_success = 0.500006, num_trials = 18595, ratio = 1.386292, num_keys = 2.682696e+02; size key_space = 7.196857e+06
p_success = 0.500008, num_trials = 30457, ratio = 1.386309, num_keys = 4.393971e+02; size key_space = 1.930698e+07
p_success = 0.500004, num_trials = 49885, ratio = 1.386300, num_keys = 7.196857e+02; size key_space = 5.179475e+07
p_success = 0.500001, num_trials = 81706, ratio = 1.386294, num_keys = 1.178769e+03; size key_space = 1.389495e+08
p_success = 0.500001, num_trials = 133826, ratio = 1.386297, num_keys = 1.930698e+03; size key_space = 3.727594e+08
p_success = 0.500002, num_trials = 219193, ratio = 1.386298, num_keys = 3.162278e+03; size key_space = 1.000000e+09
p_success = 0.500001, num_trials = 359014, ratio = 1.386295, num_keys = 5.179475e+03; size key_space = 2.682696e+09
p_success = 0.500001, num_trials = 588027, ratio = 1.386296, num_keys = 8.483429e+03; size key_space = 7.196857e+09
p_success = 0.500000, num_trials = 963125, ratio = 1.386295, num_keys = 1.389495e+04; size key_space = 1.930698e+10
p_success = 0.500000, num_trials = 1577496, ratio = 1.386294, num_keys = 2.275846e+04; size key_space = 5.179475e+10
p_success = 0.500000, num_trials = 2583771, ratio = 1.386294, num_keys = 3.727594e+04; size key_space = 1.389495e+11
p_success = 0.500000, num_trials = 4231943, ratio = 1.386295, num_keys = 6.105402e+04; size key_space = 3.727594e+11
p_success = 0.500000, num_trials = 6931472, ratio = 1.386294, num_keys = 1.000000e+05; size key_space = 1.000000e+12

If you were to plot the ratio column versus the size of the key space, you would get a straight line. As in, the ratio is essentially constant as long as the key set and key space are several orders of magnitude apart. Notice that the sparsity varies, but that does not affect the ratio. This is typical of these kinds of sparse probability problems. So from this simple experiment, you can say with VERY high confidence, that the number of guesses needed with 2.3e7 keys, in a key space of 2^128 = 3.4e38, is the product of the ratio limit above 1.386294 with the projected value for a total of

1.386294 * (2^128 / (2 * 2.3e7)) = 1.02550305123542e+31 

guesses needed for a 50% chance of guessing a valid UUID.

At 1 trillion guess a second, it would take 325 billion years to take that many guesses. In other words, you're safe. :)

Upvotes: 0

Simon Byrne
Simon Byrne

Reputation: 7864

The problem, as pointed out by all the other answers, is that r = 3000000/(2^128) < eps(1)/2, so 1 + r == 1

The easiest way is to rearrange your expression, and exploit some other functions along the way. Rewrite:

(1 - 23000000/(2^128))^n = exp(n*log(1- 23000000/(2^128))

Now, this will still have the same problem, but there exists a log1p function for computing log(1+x) accurately. So instead use:

exp(n*log1p(-23000000/(2^128)))

Upvotes: 0

Georgggg
Georgggg

Reputation: 564

Easily explained:

using:

   eps(double(1))

in Matlab you will find the smallest gap between 1 (at its biggest precision = double) and the next floating point number it can distinguish when performing math operations. In this case the gap is equal to 2.2204e-016

Since:

success_rate = (23,000,000 / (2^128))

will return 6.7591e-032 and it is way smaller than the above introduced gap when performing 1 - 6.7591e-032 Matlab understands that is substracting 0 from 1 and therefore you get always 1 as an answer. Hope it helps.

Upvotes: 2

Vladislav Martin
Vladislav Martin

Reputation: 1674

... are these kinds of problems inherently impossible to solve using floating point systems?

Short Explanation:

Well, yes by default in MATLAB and no if you use the Symbolic Toolbox in MATLAB.

You can definitely represent very very small numbers with double precision floating point numbers in MATLAB. However, the problem you're encountering has to do with operating on double precision floating point numbers that are too many orders of magnitude apart from each other - in performing calculations, you are limited by the precision of MATLAB calculations.

Thankfully, there is a toolbox to alleviate this problem in the form of the Symbolic Toolbox and variable-precision arithmetic. Look into that if you'd like to get something other than 1 when you perform 1 - (small_value).

Longer Explanation:

http://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html#f2-98720

Double precision floating point numbers in MATLAB have a pretty impressive maximum accuracy of -1.79769e+308 to -2.22507e-308 and 2.22507e-308 to 1.79769e+308. However, MATLAB only calculates to a maximum precision of 53 bits: an accuracy of 9.007199255×10¹⁵.

Here's my explanation of how that could produce the result that you've encountered (1 - small_value = 1):

The number 1.234e12 is represented with an accuracy of roughly 1e16, which means MATLAB can operate on this number with an error of roughly 1e-4. Simliarly, 2.345e-7 has a calculation error of roughly 1e-23. Thus, adding the two numbers will have an error of 1e-4, so the smaller number has been lost in the error of the calculation MATLAB performs.

If you don't mind waiting through longer computation times associated with performing operations on a much larger number than 53 bits, then I would highly recommend that you use the Symbolic Toolbox in MATLAB (namely the vpa function).

If my answer doesn't sit well with you, maybe you could check out this answer to a related question in the MATLAB forums. I took parts of my sample numbers from this answer.

Happy coding, I hope this helps!

Upvotes: 2

Related Questions