Hamid K
Hamid K

Reputation: 1165

Text Embeddings from Finetuned llama2 model

I have finetuned my locally loaded llama2 model and saved the adapter weights locally. To load the fine-tuned model, I first load the base model and then load my peft model like below:

model = PeftModel.from_pretrained(base_model, peft_model_id)

Now, I want to get the text embeddings from my finetuned llama model using LangChain but LlamaCppEmbeddings accepts model_path as an argument not the model. What is the best way to create text embeddings using a loaded model?

embeddings = LlamaCppEmbeddings(model_path=llama_model_path, n_ctx=2048)

my questions are: 1- how to save my finetuned model (base+peft) not only the PeftModel locally? 2- How can I create text embeddings from my loaded model? Thanks in advance

Upvotes: 1

Views: 1579

Answers (1)

Mike B
Mike B

Reputation: 3476

You can create and persist you embeddings by using any of the vectorstores available in langchain. In this example FAISS was used.

from langchain.embeddings import LlamaCppEmbeddings
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter



# load pdfs in folder
loader = DirectoryLoader(
    'path/to/pdfs',
    glob='*.pdf',
    loader_cls=PyPDFLoader
)
documents = loader.load()

# split all pdfs
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
splits = text_splitter.split_documents(documents)

# load CPP llama model
embedder = LlamaCppEmbeddings(model_path="/path/to/fine_tuned_model.bin")

# create faiss vectorstore
vectorstore = FAISS.from_documents(
    documents=splits,
    embedding=embedder,
)

# persist locally
vectorstore.save_local('./faiss_vs')

# create chroma vectorstore
# vectorstore = Chroma.from_documents(
#    documents=splits,
#    embedding=embedder,
#    persist_directory=str(./chroma_vs)
# )

Upvotes: 0

Related Questions