Hugo Pinho
Hugo Pinho

Reputation: 83

Converting PDF page to JPG returns blank

I have a function that asks the user for a PDF file and receive the page number the user wish to convert into an image. The function usually works fine however with a few PDFs it does not work, the image that is returned is blank and it has 4 mega bytes. Apparently it has something to do with the size of the file. Is there a way to solve this problem?

from PyPDF2 import PdfFileReader, PdfFileWriter
from tkinter.filedialog import askopenfilename
from pdf2image import convert_from_path
import os
import PIL

PIL.Image.MAX_IMAGE_PIXELS = None

def convert_pdf(page_number):
    filename = askopenfilename()

    pdf_file_path = filename
    file_base_name = pdf_file_path.replace('.pdf', '')

    pdf = PdfFileReader(pdf_file_path)

    pages = [page_number]

    pdfWriter = PdfFileWriter()

    for page_num in pages:
        pdfWriter.addPage(pdf.getPage(page_num))

    with open('{0}_subset.pdf'.format(file_base_name[:-5]), 'wb') as f:
        pdfWriter.write(f)
        f.close()

    n = file_base_name[:-5]
    nome = f'{n}_subset.pdf'

    pages = convert_from_path(nome, poppler_path=r'C:\Program Files\poppler-0.68.0\bin')
    i = 1

    name = os.path.basename(nome).split('/')[-1][:-4]

    for page in pages:
        image_name = "Page_" + str(i) + f"{name}.jpg"
        page.save(image_name, "JPEG")
        i = i + 1

Upvotes: 1

Views: 316

Answers (1)

Hugo Pinho
Hugo Pinho

Reputation: 83

The solution to this problem was to change the DPI parameter of convert_from_path function. It is important to leave the DPI as it is, since I found that certain images become really small, and therefore unreadable.

 try:
        pages = convert_from_path(nome, poppler_path=r'C:\Program Files\poppler-0.68.0\bin')
        i = 1
    except:
        PIL.Image.MAX_IMAGE_PIXELS = None
        pages = convert_from_path(nome, 25,poppler_path=r'C:\Program Files\poppler-0.68.0\bin')
        i = 1

Upvotes: 1

Related Questions