Reputation: 9
Are quantized versions available for other transformer models beyond LLMs, specifically for translation models? I'm looking for information about the following models:
This models:
I've conducted online searches but haven't found a definitive answer on the availability of quantized versions for these models.
Upvotes: 0
Views: 898
Reputation: 122168
It might not work for every model, but you can try 8-bit quantization with native pytorch, https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html like this:
import gc
import torch
from torch import nn
def quantize_model(model: nn.Module, precision=torch.qint8, layers={nn.LayerNorm, nn.Linear, nn.Dropout}):
model_q = torch.quantization.quantize_dynamic(
model, layers, dtype=precision)
del model
gc.collect()
return model_q
Then:
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
model = quantize_model(self.model)
There's a few more examples for 8-bits quantization on https://github.com/alvations/lightyear/tree/main/lightyear/translators
But with more modern, post 2022 ChatGPT models, I'll suggest looking at bitsandbytes, https://huggingface.co/blog/4bit-transformers-bitsandbytes
Upvotes: 0