Reputation: 11
I am trying to embed the text of computer code using codestral, and I keep geting vectors of different dimension. Obviously this is not desirable behaviour for an embedding. I suspect this is a bug, but haven't ruled out the possibility that this is weird intended behaviour for codestral.
I would like to embed things with codestral in a space with constant dimension. Life would be easier if the code could run on a computer without GPU (which is why I'm using llama_cpp and quantized models), but if this is a fundamental issue I can work around it.
I've tried using codestral a few different ways, but to give a concrete minimal example, my most recent run used:
from llama_cpp import Llama
model_kwargs = { "n_ctx":4096,
"n_threads":4,
"n_gpu_layers":0, }
model_path = "Codestral-22B-v0.1-IQ4_XS.gguf"
llm = Llama(model_path=model_path, **model_kwargs, embedding = True) llm.embed(text)
I've used similar code with llama_cpp in the same environment with other quantized models and obtained reasonable results. I've also checked a few basic things about what is going on (e.g. it isn't a matter of "chunking" - the dimension tops out at 512, which is common even for short pieces of code, and the GCD of the dimensions I've seen is 1). I've also tried running the generative model with the same environment/model/etc, and it seems perfectly fine.
Upvotes: 0
Views: 50