How to Speed Up Document Retrieval with llama_index Using a Local Model in Jupyter Notebook?

Question

I'm working on a project that uses llama_index to retrieve document information in Jupyter Notebook, but I'm experiencing very slow query response times (around 15 minutes per query). I'm using the following code:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama 

documents = SimpleDirectoryReader("C:path/example/data").load_data()

# Using bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# Setting up Ollama LLM with a timeout of 1 hour
Settings.llm = Ollama(model="llama3", request_timeout=3600.0)

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

I’m running this on a localhost Jupyter Notebook, and it consistently takes 15 minutes or longer to return results.

Reducing request_timeout to speed up the query, but it results in a ReadTimeout error

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama 

documents = SimpleDirectoryReader("C:path/example/data").load_data()

# Using bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# Setting up Ollama LLM with a timeout of 1 hour
Settings.llm = Ollama(model="llama3", request_timeout=60.0)

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

How can I speed up the response time when querying documents? Are there ways to optimize or fully use a local model to improve retrieval speed? Specifically, is there a way to handle embeddings and LLM processing locally to avoid network latency or timeouts? Any help on reducing retrieval time or configuring a local model setup

How to Speed Up Document Retrieval with llama_index Using a Local Model in Jupyter Notebook?

Answers (1)

Related Questions