dain
dain

Reputation: 752

Python: When finding the product of a large array, how best to reduce floating point error?

Suppose I have a large array with a bunch of floats in it, and I need to find the product while losing as little precision as possible to floating point errors:

import numpy as np
randoms = np.random.uniform(0.5, 1.61, 10000)
print(randoms[0:10])

array([ 1.01422339,  0.65581167,  0.8154046 ,  1.49519379,  0.96114304,
    1.20167417,  0.93667198,  0.66899907,  1.26731008,  1.59689486])

A presumably bad approach is to loop through the array and iteratively multiply. This will obviously have an error that compounds with each multiplication, so should be avoided if possible:

product_1 = 1
for i in randoms:
    product_1 = product_1 * i
print(product_1)

64355009.758539267

The next method is to use numpy's built-in prod function, however this returns the exact same value as above, indicating that this is how prod is actually computing it:

product_2 = np.prod(randoms)
print(product_2)

64355009.758539267

print(product_1 == product_2)

True

A third way is to compute the logarithm of every term, sum them, and exponentiate at the end. Each logarithm is computed separately so there isn't the same compounding of the error, but the logarithm process and the exponentiation process both introduce some error themselves. In any case it produces a different answer:

product_3 = np.exp(np.sum(np.log(randoms)))
print(product_3)

64355009.758538999

print(product_3 == product_1)

False

I know that I'm not losing that much precision in this example, but for what I actually need to do, the compounding errors do end up causing troubles, enough that I'm considering using a package that can do symbolic / arbitrary precision computation. So, which method is the best here? Are there other ways I haven't considered?

Upvotes: 1

Views: 356

Answers (1)

casevh
casevh

Reputation: 11424

I tried some experiments. The code is below but first some comments.

It is possible to compute the result exactly by converting the values into exact rational numbers, computing the product exactly, and then performing a final conversion to a float. It can be done with the fractions module included with Python but it will eventually get very slow. I used the gmpy2 module for faster rational arithmetic.

There are some subtleties with the formatting of binary floating-point values for display. Recent versions of Python return the shortest possible decimal string that will result in the original value. numpy floats have a different formatting. And so does the gmpy2.mpfr type. And Decimal obviously used a different formatting rule. So I always convert the result calculated to a Python float.

In addition to the user-definable decimal precision of the Decimal type, I also used gmpy2.mpfr since it supports user-definable binary precision.

The program outputs several values:

  • Sequential multiplication using 53-bit precision (IEEE 64-bit format).
  • Exact value using rational arithmetic.
  • Using Decimal with 28 decimal digits of precision.
  • Using Decimal with a user-specified precision.
  • Using mpfr with a user-specified precision.
  • Using a recursive multiplication method to minimize the number of multiplications.

Here is the code. You can modify the Decimal and mpfr precision and test the accuracy.

import numpy as np
from gmpy2 import mpq, mpfr, get_context, round2
from decimal import Decimal, getcontext

randoms = np.random.uniform(0.5, 1.61, 10000)

# Sequential multiplication using 53-bit binary precision.

product_1 = 1
for i in randoms:
    product_1 = product_1 * i
print("53-bit binary:     ", float(product_1))

# Exact value by converting all floats to fractions and then a final
# conversion to float. Uses gmpy2 for speed.

product_2 = 1
for i in randoms:
    product_2 = product_2 * mpq(i)
print("exact using mpq:   ", float(mpfr(product_2, precision=53)))

# Decimal math with 28 decimal digits (~93 bits of precision.)

product_3 = 1
for i in randoms:
    product_3 = product_3 * Decimal(i)
print("Decimal(prec=28):  ", float(product_3))

# Choose your own decimal precision.

getcontext().prec=18
product_4 = 1
for i in randoms:
    product_4 = product_4 * Decimal(i)
print("Decimal(prec=%s):   %s" % (getcontext().prec, float(product_4)))

# Choose your own binary precision.

get_context().precision = 60
product_5 = 1
for i in randoms:
    product_5 = product_5 * mpfr(i)
print("mpfr(precision=%s): %s" % (get_context().precision, float(product_5)))

# Recursively multiply pairs of numbers together.

def rmult(d):
    if len(d) == 1:
        return d[0]
    # If the length is odd, extend with 1.
    if len(d) & 1:
        d.append(1)
    temp = []
    for i in range(len(d)//2):
        temp.append(d[2*i] * d[2*i+1])
    return rmult(temp)

print("recursive 53-bit:  ", float(rmult(list(randoms))))

As a rough guideline, as the number of multiplications increase, the intermediate precision will need to increase. Rational arithmetic will effectively give you infinite intermediate precision.

How critical is it that the result is 100% accurate?

Upvotes: 1

Related Questions