Reputation: 95
Currently train keras on tensorflow model with default setting - float32.
Post training the network is quantized: cast weights to float16. This improves performance by ~x3 while keeping the same accuracy.
I was trying to train from start using float16 and failed miserably. I cannot find any link that explain if that is possible and if not why is it not possible.
Upvotes: 1
Views: 1992
Reputation: 24805
Automated Mixed Precision from NVidia might be a way to go.
From what I've gathered since 1.14
it is (was) supported in the upstream. All you would have to do is wrap your optimizer like this:
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
You might also need to set specific environment variable
from within your Python script, namely:
os.environ[‘TF_ENABLE_AUTO_MIXED_PRECISION’] = ‘1’
Above should already employ good mixed precision training practices (e.g. loss scaling, keeping float32
where necessary etc.).
Good resource for this solution should be official NVidia's documentation.
Some other resources gathered which also might be useful (though do not seem to indicate you would have to do anything more) here, here or here.
I would advise against manual casting as you might easily lose precision (e.g. in BatchNorm
statistics used during inference) unless you know ins-and-outs of specific layers.
Additionally, you might also check bfloat16
(brain float) type from Google which has exponent
part of float32
(8
bits) and smaller fraction. This allows it to keep greater range of values (e.g. when computing gradients) when compared to float16
which allows one to avoid loss scaling
.
Above (bfloat16
) should be useful mainly in TPUs, AFAIK NVidia GPU's support for it is not too great (someone correct me if I'm wrong). Some information here.
Upvotes: 2