Reputation: 65
I am trying to put together a simple "Q&A with sources" using Langchain and a specific URL as the source data. The URL consists of a single page with quite a lot of information on it.
The problem is that RetrievalQAWithSourcesChain
is only giving me the entire URL back as the source of the results, which is not very useful in this case.
Is there a way to get more detailed source info? Perhaps the heading of the specific section on the page? A clickable URL to the correct section of the page would be even more helpful!
I am slightly unsure whether the generating of the result source
is a function of the language model, URL loader or simply RetrievalQAWithSourcesChain
alone.
I have tried using UnstructuredURLLoader
and SeleniumURLLoader
with the hope that perhaps more detailed reading and input of the data would help - sadly not.
Relevant code excerpt:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=VectorStore.as_retriever())
result = chain({"question": question})
print(result['answer'])
print("\n Sources : ",result['sources'] )
Upvotes: 6
Views: 8821
Reputation: 756
I tried a thousand times and finally settled on this format using json dumps, it also doesn't care what file format you had, you can use this with CSV, PPT or PDF files, it will output the metadata for each one. It's the only solution I found that provides the most bang for the buck with output / extract sources.
res_dict = {
"answer_from_llm": response["result"], ### looks up result key from raw output
}
res_dict["source_documents"] = [] ### create an empty array for source documents key front result dict
for each_source in response["source_documents"]:
res_dict["source_documents"].append({
"page_content": each_source.page_content,
"metadata": each_source.metadata
})
print(json.dumps(res_dict["source_documents"], indent=4, default=str))
output is very clean json:
> Finished chain.
=======PRINT PARTIAL ========
[
{
"page_content": "Event Name: Las Vegas Strip Helicopter tour\nHost: Self-guided\nLocation: 3500 Las Vegas blvd\nEvent Category: Fun in Vegas\nDate: 25-Mar\nTime start: 7:00 AM\nTime end: 9:00 AM\n:",
"metadata": {
"source": "csv-files/events-small-csv.csv",
"row": 33
}
},
{
"page_content": "Event Name: Grand Canyon tour\nHost: Self-guided\nLocation: Luxor\nEvent Category: Fun in Vegas\nDate: 25-Mar\nTime start: 12:00 PM\nTime end: 4:00 PM\n:",
"metadata": {
"source": "csv-files/events-small-csv.csv",
"row": 32
}
},
{
"page_content": "Event Name: Immersive film at the Sphere\nHost: Self-guided\nLocation: 225 Sands Ave.\nEvent Category: Fun in Vegas\nDate: 25-Mar\nTime start: 7:00 AM\nTime end: 9:00 AM\n:",
"metadata": {
"source": "csv-files/events-small-csv.csv",
"row": 34
}
}
]
============================================
============================================
Upvotes: 0
Reputation: 3704
ChatGPT is very flexible, and the more explicit you are better results you can get. This link show the docs for the function you are using. there is a parameter for langchain.prompts.BasePromptTemplate that allows you to give ChatGPT more explicit instructions.
It looks like the base prompt template is this
Use the following knowledge triplets to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:
You can add in another sentence giving ChatGPT more clear instructions
Please format the answer with JSON of the form { "answer": "{your_answer}", "relevant_quotes": ["list of quotes"] }. Substitutde your_answer as the answer to the question, but also include relevant quotes from the source material in the list.
You may need to tweak it a little bit to get ChatGPT responding well. Then you should be able to parse it.
ChatGPT has 3 message types in the API
I strongly recommend these courses on ChatGPT since they are from Andrew Ng and very high quality.
Upvotes: 3