yann ziselman
yann ziselman

Reputation: 2002

Numerical inconsistency between loop and builtin function

I'm trying to compute the sum of an array of random numbers. But there seems to be an inconcistancy between the results when I do it one element at a time and when I use the built-in function. Furthermore, the error seems to increase when I decrease the data precision.

import torch
columns = 43*22
rows    = 44
torch.manual_seed(0)
array = torch.rand([rows,columns], dtype = torch.float64)
array_sum = 0
for i in range(rows):
    for j in range(columns):
        array_sum += array[i, j]

torch.abs(array_sum - array.sum())

results in:

tensor(3.6380e-10, dtype=torch.float64)

using dtype = torch.float32 results in:

tensor(0.1426)

using dtype = torch.float16 results in (a whooping!):

tensor(18784., dtype=torch.float16)

I find it hard to believe no one has ever asked about it. Yet, I haven't found a similar question in SO.

Can anyone please help me find some explanation or the source of this error?

Upvotes: 3

Views: 269

Answers (3)

lifezbeautiful
lifezbeautiful

Reputation: 1337

You can use float(array[i][j]) in place of array[i][j] in order to ensure ~0 difference between the loop-based sum and the torch.sum(). The ~0 is easy to observe when the number of elements are taken into account as shown in the two plots below.

The heatmaps below show the error per element = (absolute difference between torch.sum() and loop-based sum), divided by the number of elements. The heatmap value when using an array of r rows and c columns is computed as:

     heatmap[r, c] = torch.abs(array_sum - array.sum())/ (r*c) 

We vary the size of the array in order to observe how it affects the errors per element. Now, in the case of OP's code, the heatmaps show accumulating error with increasing size of matrix. However, when we use float(array[i,j]), the error is not dependent on the size of the array.

Top Image: when using array_sum += float(array[i][j])

Bottom Image: when using array_sum += (array[i][j])

The script used to generate these plots is reproduced below if someone wants to fiddle around with these.

import torch
import numpy as np


column_list = range(1,200,10)
row_list = range(1,200,10)

heatmap = np.zeros(shape=(len(row_list), len(column_list)))

for count_r, rows in enumerate(row_list):
    for count_c, columns in enumerate(column_list):
        
        ### OP's snippet start
        torch.manual_seed(0)
        array = torch.rand([rows,columns], dtype = torch.float16)
        array_sum = np.float16(0)
        for i in range(rows):
            for j in range(columns):
                array_sum += (array[i, j])
        ### OP's snippet end

        heatmap[count_r, count_c] = torch.abs(array_sum - array.sum())/ (rows*columns)
        
        
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd

X = row_list
Y = column_list
Z = heatmap

df = pd.DataFrame(data=Z, columns=X, index=Y)
sns.heatmap(df, square=False)
plt.xlabel('number of rows')
plt.ylabel('number of columns')
plt.tight_layout()
plt.savefig('image_1.png', dpi=300)
plt.show()   

Upvotes: 2

Richard Boyne
Richard Boyne

Reputation: 327

You have hit the top of a rather big iceberg wrt to storing high precision values in a computer.

There are two concerns here, one how python always stores a double floating point value, so you have casting between two different data types here leading to some of the odd behavior. Second is how floating point numbers in general work (you can read more here).

In general when you store a number in a float you are "guaranteed" some number of significant figures, say 10, then any values after 10 places will have some error in them due to the precision they were stored at (often denoted ε). This means that if you have a sum of two numbers across 10 orders of magnitude then ε will be significant in your answer, or (far more likely in this case) you will drop some of the values you care about because the total is much larger than one of the numbers you are adding. Below are some examples of this in numpy:

import numpy as np

val_v_small = np.float(0.0000000000001)
val_small = np.float(1.000000001)
val_big = np.float(10000000)

print(val_big + val_small)  # here we got an extra .000000001 from the ε of val_big
>>> 10000001.000000002 

print(val_big + val_v_small)  # here we dropped the values we care about the val_v_small as they were truncated off v_big
>>> 10000000.0

Upvotes: 0

DwightFromTheOffice
DwightFromTheOffice

Reputation: 520

The first mistake is this: you should change the summation line to

array_sum += float(array[i, j])

For float64 this causes no problems, for the other values it is a problem, the explenation will follow.

To start with: when doing floating point arithmetic, you should always keep in mind that there are small mistakes due to rounding errors. The most simple way to see this is in a python shell:

>>> .1+.1+.1-.3
5.551115123125783e-17

But how do you take these errors into account? When summing n positive integers to a total tot, the analysis is fairly simple and it the rule is:

error(tot) < tot * n * machine_epsilon

Where the factor n is usually a gross over-estimation and the machine_epsilon is dependant on the type (representation size) of floating point-number. And is approximatly:

float64: 2*10^-16
float32: 1*10^-7
float16: 1*10^-3

And one would generally expect as an error approximately within a reasonable factor of tot*machine_epsilon.

And for my tests with float16 we get (with always +-40000 variables summing to a total of +- 20000):

error(float64) = 3*10^-10 ≈ 80* 20000 * 2*10^-16
error(float32) = 1*10^-1  ≈ 50* 20000 * 1*10^-7

which is acceptable.

Then there is another problem with the float 16. There is the machine epsilon = 1e-4 and you can see the problem with

>>> ar = torch.ones([1], dtype=float16)
>>> ar
tensor([2048.], dtype=torch.float16)
>>> ar[0] += .5
>>> ar
tensor([2048.], dtype=torch.float16)

Here the problem is that when the value 2048 is reached, the value is not precise enough to be able to add a value 1 or less. More specifically: with a float16 you can 'represent' the value 2048, and you can represent the value 2050, but nothing in between because it has too little bits for that precision. By keeping the sum in a float64 variable, you overcome this problem. Fixing this we get for float16:

error(float16) = 16  ≈ 8* 20000 * 1*10^-4

Which is large, but acceptable as a value relative to 20000 represented in float16.

If you ask yourself, which of the two methods is 'right' then the answer is none of the two, they are both approximations with the same precision, but a different error. But as you probably guessed using the sum() method is faster, better and more reliable.

Upvotes: 3

Related Questions