Reputation: 1
trying to read a PDF document in langchain but getting the below error . Pypdf is also installed. Can you please advise how to resolve it
langchain.document_loaders.PyPDFLoader
file_path="C:/Users/gbzr7856/Desktop/LanChain/Data Science from Scratch.pdf"
pdf_loader = PyPDFLoader(file_path=file_path)
ValueError Traceback (most recent call last) Cell In[46], line 1 ----> 1 pdf_loader = PyPDFLoader(file_path=file_path)
File ~\Anaconda3\envs\langchain\Lib\site-packages\langchain\document_loaders\pdf.py:157, in PyPDFLoader.init(self, file_path, password, headers, extract_images)
153 except ImportError:
154 raise ImportError(
155 "pypdf package not found, please install it with " "pip install pypdf
"
156 )
--> 157 super().init(file_path, headers=headers)
158 self.parser = PyPDFParser(password=password, extract_images=extract_images)
File ~\Anaconda3\envs\langchain\Lib\site-packages\langchain\document_loaders\pdf.py:100, in BasePDFLoader.init(self, file_path, headers) 98 self.file_path = str(temp_pdf) 99 elif not os.path.isfile(self.file_path): --> 100 raise ValueError("File path %s is not a valid file or url" % self.file_path)
ValueError: File path C:/Users/gbzr7856/Desktop/LanChain/Data Science from Scratch.pdf is not a valid file or url
i want to read the PDF document
Upvotes: 0
Views: 526
Reputation: 2470
From the exception, you ought to take two actions.
pip install pypdf
file_path
is not correctly specified. Try thisfrom langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("example_path/layout-parser-paper.pdf") # Don't forget the double-quote
pages = loader.load_and_split()
Upvotes: 0