Reputation: 1160
So im creating a chatbot for my university website using langchain
and gemini LLM
.
the problem is when asked some questions the answer is a combinations of the documents retrieved from the RAG and that wrong, here is the code:
from langchain.docstore.document import Document
docs = []
for x, instance in enumerate(formations):
docs.append(
Document(page_content=instance['content'], metadata=instance['metadata'])
)
embeddings_model = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
llm = ChatGoogleGenerativeAI(model='gemini-1.5-pro-latest', temperature=0.3)
#%%
db_2 = Chroma.from_documents(docs,embeddings_model, collection_metadata={"hnsw:space": "cosine",'k':4})
metadata_field_info = [
AttributeInfo(
name="diplome",
description="The name of the diploma. One of [DEUST, Licence en Sciences et Techniques, Master en Sciences et Techniques, Ingénieur d'État]",
type="string",
),....]
retriever = SelfQueryRetriever.from_llm(
llm,
db_2,
document_content_description,
metadata_field_info,
enable_limit=True
)
as you can see im using self querying because the data is in json and has metadata section that help with the similarity search. here is the PromptTemplate:
template = """you are now an ai assistant working at 'university name'.
answer using only the context.
if the given context can answer the input in anyway possible then you can answer it even if its not 100% what is required.
if the topic of the input is similar to the context then answer based on that even if its not 100% what was asked.
if more informations are needed you can guide the user to visit the website of the faculty which is: https://fstt.ac.ma/Portail2023/
your output should follow this structure:
introduction
the desired answer
instruct the user to visit the website for informations
ask if there is anything else you can help with.
context : {context}.
input : {input}.
output:
"""
prompt = PromptTemplate.from_template(template)
combine_chain_2 = create_stuff_documents_chain(llm, prompt)
retrieval_chain_2 = create_retrieval_chain(retriever, combine_chain_2)
now when using this get an answer to the question: i want to continue my studies in AI, give me the best master name to continue in.(the question is in french) here is the response:
Master en Intelligence Artificielle et Big Data: Votre Passerelle vers l'IA
Si vous souhaitez poursuivre une carrière dans le domaine passionnant de l'intelligence artificielle, la Faculté des Sciences et Technologies de Tanger propose un master parfaitement adapté à vos aspirations : le Master en Intelligence Artificielle et Big Data.
Ce programme d'excellence...etc the problem with this the university doesn't have a master named Intelligence Artificielle et Big Data, and when looking to the context used to generate the answer: [Document(page_content="description and objectifs of the master ... ", metadata={'diplome': 'Master en Sciences et Techniques', 'email_coordinateur': '...', 'modules': "[...]", 'nom_coordinateur': '...', 'nom_filliere': 'MST : Intelligence Artificielle et Sciences de Données'}), Document(page_content='description and objectifs of the master ... ', metadata={'diplome': 'Master en Sciences et Techniques', 'email_coordinateur': '...', 'modules': '[....]', 'nom_coordinateur': '...', 'nom_filliere': 'MST : Mobiquité et Big Data'})
as you can see the answer was a combination of both these two masters name,it took intelligence artificial from the first one and big data from the seconde one. how can I change this behivior maybe a prompt or a parameter i don't know about? thank you for you time!
Upvotes: 0
Views: 107