Reputation: 21
I am trying to built an app using streamlit, in which the bot is able to give answers to users, based on the content of the csv file.
I'm new to working with LangChain and have some questions regarding document retrieval. In the 'embeddings.py' file, I've created a vector base containing embeddings for a CSV file. Each row in the CSV represents an attraction, so I have split the data per row. The goal here is for my bot to generate answers based on the information in the CSV file.
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader
DB_FAISS_PATH = "vectorstore/db_faiss"
loader = CSVLoader(file_path="./data/cleanTripLisbon.csv", encoding="utf-8", csv_args={'delimiter': ','})
data = loader.load()
text_splitter = CharacterTextSplitter(separator='\\n')
text_chunks = text_splitter.split_documents(data)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
docsearch = FAISS.from_documents(text_chunks, embeddings)
docsearch.save_local(DB_FAISS_PATH)
In chat_bot.py, I have my bot implementation.
from util import local_settings
from openai import OpenAI
class GPT_Helper:
def __init__(self,
OPENAI_API_KEY: str,
system_behavior: str="",
model="gpt-3.5-turbo",
):
self.client = OpenAI(api_key=OPENAI_API_KEY)
self.messages = \[\]
self.model = model
if system_behavior:
self.messages.append({
"role": "system",
"content": system_behavior
})
def get_completion(self, prompt, temperature=0):
self.messages.append({"role": "user", "content": prompt})
completion = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=temperature,
)
self.messages.append(
{
"role": "assistant",
"content": completion.choices[0].message.content
}
)
return completion.choices[0].message.content
class AttractionBot:
def __init__(self, system_behavior: str):
self._system_behavior = system_behavior
self._username = None # Add a private attribute to store the username
self.engine = GPT_Helper(
OPENAI_API_KEY=local_settings.OPENAI_API_KEY,
system_behavior=system_behavior
)
def set_username(self, username):
self._username = username
def generate_response(self, message: str):
# Include the username in the message if available
user_message = f"{self._username}: {message}" if self._username else message
# Generate response using the language model
response = self.engine.get_completion(user_message)
return response
def reset(self):
...
@property
def memory(self):
return self.engine.messages
@property
def system_behavior(self):
return self._system_behavior
@system_behavior.setter
def system_behavior(self, system_config: str):
self._system_behavior = system_config
My biggest question is how do I get the bot to get the information from that csv? I am aware that I might need to use docsearch.as_retriever, but I don't know where...
Upvotes: 2
Views: 359
Reputation: 710
I suggest you to try the Langchain Pandas dataframe agent. It generates and execute pandas code on your dataframe based on text input. May not the best option for prod environment but the most quick for prototyping before developing your stable version:
https://python.langchain.com/docs/integrations/toolkits/pandas
DOC sample:
Upvotes: 0
Reputation: 122
you can use the docsearch object you created from the FAISS index to retrieve information from the CSV file and you can use the as_retriever method to convert the docsearch object into a retriever, and then use the retriever to fetch relevant information based on user queries.
class AttractionBot:
def __init__(self, system_behavior: str, docsearch):
# Other initialization code
self.doc_retriever = docsearch.as_retriever()
def generate_response(self, message: str):
# Include the username in the message if available
user_message = f"{self._username}: {message}" if self._username else message
# Use the retriever to fetch relevant information
relevant_info = self.doc_retriever.retrieve(user_message)
# Generate response using the language model and relevant_info
# (you need to modify this part based on your specific use case)
response = self.engine.get_completion(user_message)
return response
need to pass the docsearch object when initializing the AttractionBot.
attraction_bot = AttractionBot(system_behavior="your_system_behavior", docsearch=docsearch)
Upvotes: 0