Are quantized versions available for other transformer models beyond LLMs, specifically for translation models? I'm looking for information about the following models: This models: https://huggingface.co/Helsinki-NLP/opus-mt-zh-en https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-it-en https://huggingface.co/t5-small https://huggingface.co/facebook/nllb-200-distilled-600M I've conducted online searches but haven't found a definitive answer on the availability of quantized versions for these models.

pythondeep-learninghuggingface-transformersquantization

Reputation: 9

Can i use 4 bit, 8 bit version of transformers translation model?

Are quantized versions available for other transformer models beyond LLMs, specifically for translation models? I'm looking for information about the following models:

This models:

I've conducted online searches but haven't found a definitive answer on the availability of quantized versions for these models.

Upvotes: 0

Answers (1)

alvas

Reputation: 122168

It might not work for every model, but you can try 8-bit quantization with native pytorch, https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html like this:

import gc

import torch
from torch import nn


def quantize_model(model: nn.Module, precision=torch.qint8, layers={nn.LayerNorm, nn.Linear, nn.Dropout}):
    model_q = torch.quantization.quantize_dynamic(
        model, layers, dtype=precision)
    del model
    gc.collect()
    return model_q

Then:

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
model = quantize_model(self.model)

There's a few more examples for 8-bits quantization on https://github.com/alvations/lightyear/tree/main/lightyear/translators

But with more modern, post 2022 ChatGPT models, I'll suggest looking at bitsandbytes, https://huggingface.co/blog/4bit-transformers-bitsandbytes

Upvotes: 0

Can i use 4 bit, 8 bit version of transformers translation model?

Answers (1)

Related Questions