Mary
Mary

Reputation: 21

Retrieval Augmented Generation for Question-answering - langchain

I am trying to built an app using streamlit, in which the bot is able to give answers to users, based on the content of the csv file.

I'm new to working with LangChain and have some questions regarding document retrieval. In the 'embeddings.py' file, I've created a vector base containing embeddings for a CSV file. Each row in the CSV represents an attraction, so I have split the data per row. The goal here is for my bot to generate answers based on the information in the CSV file.

from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader

DB_FAISS_PATH = "vectorstore/db_faiss"
loader = CSVLoader(file_path="./data/cleanTripLisbon.csv", encoding="utf-8", csv_args={'delimiter': ','})
data = loader.load()

text_splitter = CharacterTextSplitter(separator='\\n')
text_chunks = text_splitter.split_documents(data)

embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

docsearch = FAISS.from_documents(text_chunks, embeddings)
docsearch.save_local(DB_FAISS_PATH)

In chat_bot.py, I have my bot implementation.

from util import local_settings
from openai import OpenAI

class GPT_Helper:
def __init__(self,
OPENAI_API_KEY: str,
system_behavior: str="",
model="gpt-3.5-turbo",
):
self.client = OpenAI(api_key=OPENAI_API_KEY)
self.messages = \[\]
self.model = model

        if system_behavior:
            self.messages.append({
                "role": "system",
                "content": system_behavior
            })
    
    def get_completion(self, prompt, temperature=0):
    
        self.messages.append({"role": "user", "content": prompt})
    
        completion = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=temperature,
        )
    
        self.messages.append(
            {
                "role": "assistant",
                "content": completion.choices[0].message.content
            }
        )
        return completion.choices[0].message.content

class AttractionBot:

    def __init__(self, system_behavior: str):
        self._system_behavior = system_behavior
        self._username = None  # Add a private attribute to store the username
    
        self.engine = GPT_Helper(
            OPENAI_API_KEY=local_settings.OPENAI_API_KEY,
            system_behavior=system_behavior
        )
    
    
    def set_username(self, username):
        self._username = username
    
    def generate_response(self, message: str):
        # Include the username in the message if available
        user_message = f"{self._username}: {message}" if self._username else message
                
        # Generate response using the language model
        response = self.engine.get_completion(user_message)
    
        return response
    
    def reset(self):
        ...
    
    @property
    def memory(self):
        return self.engine.messages
    
    @property
    def system_behavior(self):
        return self._system_behavior
    
    @system_behavior.setter
    def system_behavior(self, system_config: str):
        self._system_behavior = system_config

My biggest question is how do I get the bot to get the information from that csv? I am aware that I might need to use docsearch.as_retriever, but I don't know where...

Upvotes: 2

Views: 359

Answers (2)

Daniel Perez Efremova
Daniel Perez Efremova

Reputation: 710

I suggest you to try the Langchain Pandas dataframe agent. It generates and execute pandas code on your dataframe based on text input. May not the best option for prod environment but the most quick for prototyping before developing your stable version:

https://python.langchain.com/docs/integrations/toolkits/pandas

DOC sample:

enter image description here

Upvotes: 0

Chathura Abeywickrama
Chathura Abeywickrama

Reputation: 122

you can use the docsearch object you created from the FAISS index to retrieve information from the CSV file and you can use the as_retriever method to convert the docsearch object into a retriever, and then use the retriever to fetch relevant information based on user queries.

class AttractionBot:
    def __init__(self, system_behavior: str, docsearch):
        # Other initialization code
        
        self.doc_retriever = docsearch.as_retriever()

    def generate_response(self, message: str):
        # Include the username in the message if available
        user_message = f"{self._username}: {message}" if self._username else message
        
        # Use the retriever to fetch relevant information
        relevant_info = self.doc_retriever.retrieve(user_message)
        
        # Generate response using the language model and relevant_info
        # (you need to modify this part based on your specific use case)
        response = self.engine.get_completion(user_message)
        
        return response

need to pass the docsearch object when initializing the AttractionBot.

attraction_bot = AttractionBot(system_behavior="your_system_behavior", docsearch=docsearch)

Upvotes: 0

Related Questions