What is the optimal precision for training a typical Deep Neural Network?

Question

I am just researching the optimal precision for training a DNN. I learned that, for inference, even a compressed 8-bit precision should would work; for training, we would need a higher precision numbers. What would be the optimal precision for Deep Learning (fp16, fp32 or fp64)? I may use tensorflow-gpu for this purpose.

Prune · Accepted Answer

This depends on the evaluation function you have for "optimal": is your focus training time (less precision is faster), accuracy (less precision often is less accurate), or some other resource? This is also somewhat dependent on the model complexity and topology.

ConvNet (MNIST) will do fine on 8-bit floats; training is faster, and the accuracy difference (if any) will be insignificant. If you move to something more interdependent and fragile (perhaps a kernel-starved GNN), then you'll notice a loss of accuracy in dropping to 8-bit.

Again depending on your needs, you can sometimes save training time by dropping to 8-bit floats, but recover some lost accuracy by widening your model (more kernels in convolution layers) by a small amount.

What is the optimal precision for training a typical Deep Neural Network?

Answers (2)

Related Questions