Ajai k
Ajai k

Reputation: 1

How to Use LLM for Summarizing PDFs with Separate Paragraphs

I've been working on an LLM (Language Model) project to convert PDFs into chat-based summaries. Each PDF contains separate paragraphs, and I need to merge the "context" paragraph with the rest of the content for summarization. However, with a massive dataset of 10,000 PDFs, manual work is out of the question.

Can anyone guide me on effectively using LLM for this task, or suggest alternative approaches if LLM isn't the right solution?

Upvotes: 0

Views: 491

Answers (1)

j3ffyang
j3ffyang

Reputation: 2470

You can use DirectoryLoader function to load massive files from a certain folder

from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader('../', glob="**/*.pdf", show_progress=True)

docs = loader.load()

len(docs)

Upvotes: 0

Related Questions