David Delos
David Delos

Reputation: 51

Convert PDF file to multipage image

I'm trying to convert a multipage PDF file to image with PyMuPDF:

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

But I need to convert all the pages of the PDF file to a single image in multi-page tiff, when I give the page argument a page range, it just takes one page, does anyone know how I can do it?

Upvotes: 4

Views: 15243

Answers (4)

Jorj McKie
Jorj McKie

Reputation: 3140

PyMuPDF supports a limited set of image types for output. TIFF is not among them.

However, there is an easy way to interface with Pillow, which supports multiframe TIFF output.

Upvotes: 1

Roizy Kish
Roizy Kish

Reputation: 63

import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
i = 0
for page in doc:
    i += 1
    pix = page.getPixmap()
    output = "output_" + str(i) + ".tif"
    pix.save(output)

Upvotes: 1

ZdPo Ster
ZdPo Ster

Reputation: 342

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )

Upvotes: 7

liamsuma
liamsuma

Reputation: 196

When you want to convert all pages of the PDFs, you need a for loop. Also, when you call .getPixmap(), you need properties like matrix = mat to basically increase your resolution. Here is the code snippet (not sure if this is what you wanted but this will convert all PDFs to images):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

For resolution, here is a good example from Github to demo what it means and how it's used for your case if needed.

Upvotes: 4

Related Questions