celsowm
celsowm

Reputation: 404

How can I get the same result from LlamaCPP using it in Llama-index?

I am trying to do the same prompt (query) I did on a simple pdf (legal pt-br document) using pure llama cpp python but now using llama index:

from llama_cpp import Llama
import os, re, sys
from pypdf import PdfReader

reader = PdfReader('inicial_pg10_teste.pdf') # https://files.pdfupload.io/documents/c2683560/inicial_pg10_teste.pdf
total_pages = len(reader.pages)

texto_inicial = ''
for page_num in range(total_pages):
    page = reader.pages[page_num]
    texto_inicial += page.extract_text()

model_name = 'mistral-br-pt-q4_k_m.gguf' # https://huggingface.co/nicolasdec/CabraMistral7b-v2/blob/quantization/mistral-br-pt-q4_k_m.gguf  
llm = Llama(
      model_path=f"llms/{model_name}",
      n_gpu_layers=20,
      n_ctx=7000,
)
response = llm.create_chat_completion(
      messages = [
          {
              "role": "user",
              "content": f"""Quem são os réus desta ação? {texto_inicial}"""
          }
      ]
)
print(response['choices'][0]['message']['content'])

Result (correct, by the way):

Os réus neste processo trabalhista são:

Degustare e Servir Alimentação e Serviços Técnicos Ltda. (empresa de direito privado, inscrita no CNPJ nº: 17.104.821/0001 -70, com sede na Avenida do Rio Branco, nº 869, Centro, Niterói, Rio de Janeiro, CEP: 24020 -006) Secretaria de Estado de Educação do Rio de Janeiro (SEERJ) (pessoa jurídica de direito público interno, inscrita no CNPJ sob o nº 42.498.600/0001 -71, com sede na Rua Pinheiro Machado, s/n°, Palácio da Guanaba, Laranjeiras, Rio de Janeiro/RJ, CEP 22.231 -901)

But using llama_index:

import os, re, sys
from pathlib import Path
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, download_loader, ServiceContext, PromptTemplate
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.readers.file import PDFReader

loader = PDFReader()
documents = loader.load_data(file=Path('inicial_pg10_teste.pdf')) # https://files.pdfupload.io/documents/c2683560/inicial_pg10_teste.pdf

model_name = 'mistral-br-pt-q4_k_m.gguf' # https://huggingface.co/nicolasdec/CabraMistral7b-v2/blob/quantization/mistral-br-pt-q4_k_m.gguf

llm = LlamaCPP(
      model_path=f"llms/{model_name}",
      model_kwargs={"n_gpu_layers": 20},
      context_window=7000,
)
service_context = ServiceContext.from_defaults(
    llm=llm, 
    embed_model="local",
    chunk_size=6235
)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

template = (
    "<s> [INST] {query_str} {context_str} [/INST]"
)

custom_prompt = PromptTemplate(template)
query_engine = index.as_query_engine(text_qa_template=custom_prompt)
question = "Quem são os réus desta ação?"
response = query_engine.query(question)
print(response)

The result is different (and wrong):

Os réus neste processo são a Reclamada e a 2ª Reclamada.

So How can I get the same result from LlamaCPP using it in Llama-index?

Upvotes: 1

Views: 471

Answers (1)

Siddhartha Sengupta
Siddhartha Sengupta

Reputation: 64

Kindly use some Vector Storage like pinecone or qdrant, etc. Reduce the chunk size and introduce chunk overlap. Also you can use embedding models like "thenlper/gte-large" or similar. Please let me know if by doing these changes you are able to achieve better results.

Upvotes: 0

Related Questions