Reputation: 1
I want to retrieve the text from the pdf files but using this code, I get the total number of pages as 0. How should I improve so as to get the correct total pages in a pdf?
Upvotes: 0
Views: 2146
Reputation:
To get the number of pages with pypdf (PyPDF2 is deprecated):
from pypdf import PdfReader
reader = PdfReader("example.pdf")
number_of_pages = len(reader.pages)
Upvotes: 2
Reputation: 744
.pages method helps to do it
from PyPDF2 import PdfReader
# Read the pdf
reader = PdfReader("US_Declaration.pdf")
# Find total number of pages
readpdf = len(reader.pages)
Upvotes: 0
Reputation: 9012
(disclaimer: I am the author of pText
, the library used in this answer.)
As an alternative to pypdf2
you could also try pText
.
1.Load the Document
with open("input.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle)
2.Get the DocumentInfo
doc_info = doc.get_document_info()
number_of_pages = doc_info.get_number_of_pages()
You can obtain pText either on GitHub, or using PyPi There are a ton more examples, check them out to find out more about working with images.
Upvotes: 0