wsdzbm
wsdzbm

Reputation: 3670

accuracy of float32

To reduce the filesize, I'm trying to save float64 data to file in float32. The data values generally range from 1e-12 to 10. I tested the accuracy loss when converting float64 to float32.

print np.finfo('float32')

shows

Machine parameters for float32
---------------------------------------------------------------
precision=  6   resolution= 1.0000000e-06
machep=   -23   eps=        1.1920929e-07
negep =   -24   epsneg=     5.9604645e-08
minexp=  -126   tiny=       1.1754944e-38
maxexp=   128   max=        3.4028235e+38
nexp  =     8   min=        -max
---------------------------------------------------------------

Looks float32 has a resolution of 1e-6 and the abs value is valid down to as small as 1.2e-38.

import numpy as np

x = 2.0*np.random.rand(100) - 1.0 # make random numbers in [-1, 1]

print('x.dtype: %s'%(x.dtype)) # outputs float64

print('number :  max_error  max_relative_error')
for i in xrange(-40, 1):
    y = x * 10**i
    print('1e%-4d:  %s'%(i, np.max(np.abs(y - y.astype('f4').astype('f8')))))

The results are

number:    max_error       max_relative_error
1e-40 :    6.915620e-46    6.915620e-06
1e-39 :    6.910361e-46    6.910361e-07
1e-38 :    6.949349e-46    6.949349e-08
1e-37 :    4.816590e-45    4.816590e-08
1e-36 :    4.303771e-44    4.303771e-08
1e-35 :    3.518621e-43    3.518621e-08
1e-34 :    5.165854e-42    5.165854e-08
1e-33 :    3.660088e-41    3.660088e-08
1e-32 :    3.660088e-40    3.660088e-08
1e-31 :    4.097193e-39    4.097193e-08
1e-30 :    4.615068e-38    4.615068e-08
1e-29 :    3.696983e-37    3.696983e-08
1e-28 :    2.999860e-36    2.999860e-08
1e-27 :    4.723454e-35    4.723454e-08
1e-26 :    3.801082e-34    3.801082e-08
1e-25 :    3.062408e-33    3.062408e-08
1e-24 :    4.876378e-32    4.876378e-08
1e-23 :    3.779378e-31    3.779378e-08
1e-22 :    3.144592e-30    3.144592e-08
1e-21 :    4.991049e-29    4.991049e-08
1e-20 :    3.949261e-28    3.949261e-08
1e-19 :    3.002761e-27    3.002761e-08
1e-18 :    5.162480e-26    5.162480e-08
1e-17 :    4.135703e-25    4.135703e-08
1e-16 :    3.282146e-24    3.282146e-08
1e-15 :    4.722129e-23    4.722129e-08
1e-14 :    3.863295e-22    3.863295e-08
1e-13 :    3.375549e-21    3.375549e-08
1e-12 :    4.011790e-20    4.011790e-08
1e-11 :    4.011790e-19    4.011790e-08
1e-10 :    3.392060e-18    3.392060e-08
1e-9  :    5.471206e-17    5.471206e-08
1e-8  :    4.072652e-16    4.072652e-08
1e-7  :    3.496987e-15    3.496987e-08
1e-6  :    5.662626e-14    5.662626e-08
1e-5  :    4.412957e-13    4.412957e-08
1e-4  :    3.482083e-12    3.482083e-08
1e-3  :    5.597344e-11    5.597344e-08
1e-2  :    4.620014e-10    4.620014e-08
1e-1  :    3.540690e-09    3.540690e-08
1e0   :    2.817751e-08    2.817751e-08

The relative error is at the order of 1e-8 for values above 1e-38, lower than 1e-6 proposed by np.finfo and the error is still acceptable even if the value if lower than the tiny value of np.finfo.

Looks it's safe to save my data in float32, but I'm curious about the test looks inconsistent with the results of np.finfo?

Upvotes: 4

Views: 4613

Answers (2)

Reality Pixels
Reality Pixels

Reputation: 416

Since machine floating point epsilon is 1.1920929e-07, rounding would get you relative error within half of that for normal floats: 5.9604645e-8. However, when you get smaller than 1.1754944e-38, you have denormalized numbers, which instead have an absolute error of 1.4012985e-45.

Upvotes: 1

recursive
recursive

Reputation: 86124

Numbers that low are in the subnormal range. Basically, the exponent doesn't have enough range to get sufficiently low, so you're gradually losing significant bits as values get lower. This is called "gradual underflow".

https://en.wikipedia.org/wiki/Denormal_number

Upvotes: 6

Related Questions