Sriram
Sriram

Reputation: 165

Getting PDF Version using Python

I need to extract the PDF version from a PDF document. I tried PDF miner but it provides the below info only:

  1. PDF Producer
  2. Created
  3. Modified
  4. Application

Below is the code I tried:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open("ibs.servlets.pdf", 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
if len(doc.info) > 0:
   info = doc.info[0]
   print(info)

Is there any other libraries apart from pdf miner I can use?

Upvotes: 1

Views: 1063

Answers (1)

Frodon
Frodon

Reputation: 3775

The PDF version is stored as a comment in the first line of the PDF file. I couldn't find how to get this information using pdfparser, but using PyPDF2 I could retrieve this information manually:

from PyPDF2.pdf import PdfFileReader
doc = PdfFileReader('ibs.servlets.pdf')
doc.stream.seek(0) # Necessary since the comment is ignored for the PDF analysis
print(doc.stream.readline().decode())

Output:

%PDF-1.5

Upvotes: 2

Related Questions