lee Lin
lee Lin

Reputation: 51

Does Automatic MIXED PRECISION (AMP) half the paramters of a model?

Before I knew about automatic mixed precision, I manually halved the model and data using half() for training with half precision. But the training result is not good at all.

Then I used the automatic mixed precision to train a network, which returns decent results. But when I save the checkpoint, the parameters in the checkpoints are still in fp32. I want to save a checkpoint with fp16. Therefore, I want to ask if and how I can save the checkpoints with fp16. And this also makes me wonder: when performing conv2d with autocast, are the parameters of conv2d also halved? Or is it only the data that is halved?

Upvotes: 3

Views: 1380

Answers (1)

Rafael Toledo
Rafael Toledo

Reputation: 1064

It does not apply "half" for all parameters. It analyzes each layer separately and some work with FP16 and others with FP32.

From the documentation here.

torch.cuda.amp provides convenience methods for mixed precision, where some operations use the torch.float32 (float) datatype and other operations use torch.float16 (half). Some ops, like linear layers and convolutions, are much faster in float16 or bfloat16. Other ops, like reductions, often require the dynamic range of float32. Mixed precision tries to match each op to its appropriate datatype, which can reduce your network’s runtime and memory footprint.

About the checkpoints, a copy of the weights is maintained in the FP32 precision to be by the optimizer, as said here.

Upvotes: 1

Related Questions