Using Language Model Phi-3-Mini quantized version in Jupyter Notebook

Question

I am trying to use a small language model in my jupyter notebook and am not able to find a working solution. I want to use the quantized version of Phi-3-mini as that is small enough to fit on my GPU and runs faster. Loading the normal version of phi-3-mini works just fine. But when loading the quantized version I always get a ValueError saying that: "Unrecognized configuration class to build an AutoTokenizer."

From the documentation on HuggingFace: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx it says that only the onnx version is quantized so I am using that version.

from transformers import AutoTokenizer, AutoModelForCausalLM

# This works just fine (normal version but too big for my GPU)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct",trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct",trust_remote_code=True)

# But this throws an error (quantized version)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct-onnx", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct-onnx", trust_remote_code=True)

Using Language Model Phi-3-Mini quantized version in Jupyter Notebook

Answers (0)

Related Questions