Reputation: 11
I am trying to compare each layer outputs of two ML models trained with different libs (tensorflow and a custom lighter one) that have the same valid performance in some sort of a "unitary test" and I have big differences in testing between float32 training and float16 training, even though the predicting performances are very similar.
When training the model with float32 weights, I use np.isclose with 1e-08 atol to compare the layer outputs and i get very minimal errors, but when I train the model with float16 weights, I get much bigger errors, even though the two models have the same performance and I increased the atol to 1e-04.
I am wondering if just increasing the atol to 1e-04 is the right approach, as the np.isclose layer output comparison of the two models (even though they have the same predicting performance) gives a lot more error in float16 than in float32.
EDIT : To add more context I made a small script to reproduce the "error" i get that I don't understand. Here is an example with a dense layer with the two different implementations (keras and the custom lib) :
import keras.backend
import numpy as np
import keras
def custom_dense_layer(_input_data, weights, biases) -> np.ndarray:
_layer_output = np.matmul(_input_data, weights) + biases # weight matrix = [n1_weights, n2_weights, ..]. Thus computing (X.W + B) instead of (transpose(W).transpose(X) + transpose(B))
return _layer_output
if __name__ == '__main__' :
bit_precision=16 # used to set the computation and weights / biases dtypes, here the goal is to compare between 16 and 32
keras.backend.set_floatx(f'float{bit_precision}') # set computation dtype
weights=np.random.randn(16,16).astype(f'float{bit_precision}') # generate random weights
biases=np.random.randn(16).astype(f'float{bit_precision}') # generate random biases
input = keras.Input(shape=(16, ),dtype=f'float{bit_precision}') # set input shape
input_data = np.random.randn(1000,16).astype(f'float{bit_precision}') # generate random data
keras_dense = keras.layers.Dense(units=16) # generate keras dense layer
keras_dense(input)
keras_dense.set_weights([weights,biases]) # set the weights and biases
custom_op = custom_dense_layer(_input_data=input_data, weights=weights, biases=biases) # custom dense layer prediction calculation
keras_op = np.array(keras_dense(input_data)) # keras prediction calculation
isclose = np.isclose(keras_op, custom_op,atol=10**-(bit_precision/4),rtol=(2**-(bit_precision/2))).all(axis=1) # checking results similarity levels
failed_points = np.where(~np.isclose(keras_op, custom_op,atol=10**-(bit_precision/4),rtol=(2**-(bit_precision/2))).all(axis=1))[0]# checking results that are not close enough
perc_failed = round(100 * failed_points.size / isclose.size, 3) # "failed" results percentage
print(perc_failed)
Even though when I use float 16 I try to increase the tolerance proportionnaly to the bit reduction, I get 60+% error rate when I only get ~2% when using float32.
Upvotes: 1
Views: 150
Reputation: 71
If I understand the question correctly, you are attempting to check float32
datatypes with float16
datatypes for equality using np.isclose()
.
One solution I can think of is using relative tolerance in addition to absolute tolerance in the parameters of the np.isclose()
function. Relative tolerance multiplies the term on the right by some coefficient to get the absolute difference of the left term in the context of the right term. Source: https://numpy.org/doc/stable/reference/generated/numpy.isclose.html
For example, if I were to make a 32-bit float s32
larger than a 16-bit float s16
by a fraction so small it couldn't be captured in 16-bit precision:
import numpy as np
s16 = np.float16(2)
s32 = np.float32(2 + 2**(-15))
u16 = np.array(s16, dtype=np.float16).view(dtype=np.uint16)
u32 = np.array(s32, dtype=np.float32).view(dtype=np.uint32)
print('32:\t' + bin(u32))
print('16:\t' + bin(u16))
It outputs:
32: 0b1000000000000000000000010000000
16: 0b100000000000000
So the difference cannot be represented in 16 bits. Checking for equality with no relative tolerance evaluates to False
:
>>> print('Equality:\t', np.isclose(s16, s32, atol=(1e-8)))
Equality: False
But with some relative equality, set to 2**(-16)
to account for the precision difference:
>>> print('Equality:\t', np.isclose(s16, s32, atol=(1e-8), rtol=(2**(-16))))
Equality: True
So, to answer your question, using extra relative tolerance to account for the difference in precision is a good mathematical justification for setting the tolerance of the equality checker. Hopefully this gives you more accurate results when testing the ML model, but without more information about the custom library you are using (i.e. if it uses numpy for floats), this is the best advice I can give you.
Upvotes: 1
Reputation: 110186
you should be using rtol instead of atol for comparing numbers. And then, 1e-04 is olose the resolution for the mantissa of float16 (1/1024) ( https://en.wikipedia.org/wiki/Bfloat16_floating-point_format ) - roughly, it means that an atol of 1e-4 will allow for one bit difference when working close to the unit scale (i.e.: exponents near 0). So, yes, it does not seem like using any better precision than that would be useful for float16.
Upvotes: 0