Reputation: 1
I've been working on an LLM (Language Model) project to convert PDFs into chat-based summaries. Each PDF contains separate paragraphs, and I need to merge the "context" paragraph with the rest of the content for summarization. However, with a massive dataset of 10,000 PDFs, manual work is out of the question.
Can anyone guide me on effectively using LLM for this task, or suggest alternative approaches if LLM isn't the right solution?
Upvotes: 0
Views: 491
Reputation: 2470
You can use DirectoryLoader
function to load massive files from a certain folder
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader('../', glob="**/*.pdf", show_progress=True)
docs = loader.load()
len(docs)
Upvotes: 0