Retrieving "source documents" on a RAG setup with langchain / llama

Question

I have a set of a pdf documents (over 1000) which I've converted to text files. Let's call them "doc0001.txt", "doc0002.txt," etc. I've set up a RAG setup to query these documents.

Say doc0001.txt has references that list docA, docB, docC, etc.

I have code that queries against this text corpus like this:

prompt = "Tell me about artificial intelligence in medicine"
output = qa_llm({'query': prompt})

print (output["result"], '|'.join([i.page_content for i in output['source_documents']]))

It works! Kinda. But it doesn't give me what I want or expected. The answer that it gives, lists sources that are listed INSIDE of the documents "doc0001.txt" "doc0002.txt" etc.

That is, it lists docA, docB, etc.

That's useful, but in this case what I need to know is which of the source documents that I provided contain the information - not the references listed inside those documents. That is, the answer I want (in this case) is doc0357.txt, doc0784.txt, etc.

Is there a command to get THAT information?

Retrieving "source documents" on a RAG setup with langchain / llama

Answers (1)

Related Questions

Retrieving &quot;source documents&quot; on a RAG setup with langchain / llama

Answers (1)

Related Questions

Retrieving "source documents" on a RAG setup with langchain / llama