Reputation: 59
I have tried to print the count of pdf document which includes some blank white pdf page using pypdf module. But it avoids the blanks page and print the count of rest of pages. Below is the code.
import sys
import pyPdf
from pyPdf import PdfFileReader, PdfFileWriter
pdf_document = PdfFileReader(file(normalpdfpath,"r"))
normal = pdf_document.getNumPages()
print normal
Upvotes: 5
Views: 9883
Reputation: 690
You may try this, which worked for me:
import re
import os
rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)
def count_pages(filename):
data = file(filename,"rb").read()
return len(rxcountpages.findall(data))
if __name__=="__main__":
parent = "/Users/username/"
os.chdir(parent)
filename = 'LaTeX20120726.pdf'
print count_pages(filename)
For Python 3.6+
import re
rxcountpages = re.compile(rb"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)
def count_pages(filename: str) -> int:
with open(filename, "rb") as infile:
data = infile.read()
return len(rxcountpages.findall(data))
if __name__=="__main__":
filename = '/Users/username/LaTeX20120726.pdf'
print(count_pages(filename))
Regards
Upvotes: 2
Reputation: 1284
Just for all your googlers, here is an updated version of this answer and comment that works using built-in packages:
import re
# compile your regex to make it faster
PAGE_COUNT_REGEX = re.compile(
rb"/Type\s*/Page([^s]|$)",
re.MULTILINE|re.DOTALL
)
def get_page_count(floc, regex=PAGE_COUNT_REGEX):
"""Count number of pages in a pdf"""
with open(floc, "rb") as f:
return len(regex.findall(f.read()))
get_page_count("path/to/your/file.pdf")
Upvotes: -1
Reputation: 139
step 1:-
pip install pyPDF2
step 2:-
import requests, PyPDF2, io
url = 'sample.pdf'
response = requests.get(url)
with io.BytesIO(response.content) as open_pdf_file:
read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
num_pages = read_pdf.getNumPages()
print(num_pages)
Upvotes: 4