Reputation: 165
I need to extract the PDF version from a PDF document. I tried PDF miner but it provides the below info only:
Below is the code I tried:
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
fp = open("ibs.servlets.pdf", 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
if len(doc.info) > 0:
info = doc.info[0]
print(info)
Is there any other libraries apart from pdf miner I can use?
Upvotes: 1
Views: 1063
Reputation: 3775
The PDF version is stored as a comment in the first line of the PDF file. I couldn't find how to get this information using pdfparser, but using PyPDF2 I could retrieve this information manually:
from PyPDF2.pdf import PdfFileReader
doc = PdfFileReader('ibs.servlets.pdf')
doc.stream.seek(0) # Necessary since the comment is ignored for the PDF analysis
print(doc.stream.readline().decode())
Output:
%PDF-1.5
Upvotes: 2