Rajeev Ghai
Rajeev Ghai

Reputation: 1

not able to read PDF docuemnet in langchain

trying to read a PDF document in langchain but getting the below error . Pypdf is also installed. Can you please advise how to resolve it

langchain.document_loaders.PyPDFLoader

file_path="C:/Users/gbzr7856/Desktop/LanChain/Data Science from Scratch.pdf"

pdf_loader = PyPDFLoader(file_path=file_path)

ValueError Traceback (most recent call last) Cell In[46], line 1 ----> 1 pdf_loader = PyPDFLoader(file_path=file_path)

File ~\Anaconda3\envs\langchain\Lib\site-packages\langchain\document_loaders\pdf.py:157, in PyPDFLoader.init(self, file_path, password, headers, extract_images) 153 except ImportError: 154 raise ImportError( 155 "pypdf package not found, please install it with " "pip install pypdf" 156 ) --> 157 super().init(file_path, headers=headers) 158 self.parser = PyPDFParser(password=password, extract_images=extract_images)

File ~\Anaconda3\envs\langchain\Lib\site-packages\langchain\document_loaders\pdf.py:100, in BasePDFLoader.init(self, file_path, headers) 98 self.file_path = str(temp_pdf) 99 elif not os.path.isfile(self.file_path): --> 100 raise ValueError("File path %s is not a valid file or url" % self.file_path)

ValueError: File path C:/Users/gbzr7856/Desktop/LanChain/Data Science from Scratch.pdf is not a valid file or url

i want to read the PDF document

Upvotes: 0

Views: 526

Answers (1)

j3ffyang
j3ffyang

Reputation: 2470

From the exception, you ought to take two actions.

  1. pip install pypdf
  2. The file_path is not correctly specified. Try this
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_path/layout-parser-paper.pdf")  # Don't forget the double-quote
pages = loader.load_and_split()

Upvotes: 0

Related Questions