How to create a PIL.Image from PDF image XObjects using pikepdf in Python

Question

I am trying to do lossless PNG compression on images in PDFs using Pillow. Here is some of my code that accesses the image xobjects and tries to use them to create a PIL.Image object

import io
import pikepdf
from PIL import Image

with pikepdf.open("./doc.pdf") as pdf:
    for page in pdf.pages:
        for image_key, image_data in page.images.items():
            raw_data_stream = image_data.get_raw_stream_buffer()
            img_data_io = io.BytesIO(raw_data_stream)
            img_data_io.seek(0)
            img = Image.open(img_data_io)

This gives me a PIL.UnidentifiedImageError: cannot identify image file

I've tried changing it to

img = Image.open(img_data_io.read())

But this gives a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 1: invalid continuation byte. I've tried this on 25 different pdfs, and they have a different problematic byte (e.g., 0x83), but they all throw this error.

This is the contents of image_data:

, data=<...>, {
  "/BitsPerComponent": 4,
  "/ColorSpace": [ "/Indexed", [ "/ICCBased", pikepdf.Stream(owner=<...>, data=<...>, {
    "/Alternate": "/DeviceRGB",
    "/Filter": "/FlateDecode",
    "/Length": 2598,
    "/N": 3
  }) ], 15, pikepdf.Stream(owner=<...>, data=<...>, {
    "/Length": 49
  }) ],
  "/Filter": "/FlateDecode",
  "/Height": 326,
  "/Length": 28607,
  "/Subtype": "/Image",
  "/Type": "/XObject",
  "/Width": 1455
})>

How can I create a PIL.Image object from a such an XObject pulled from a PDF?

How to create a PIL.Image from PDF image XObjects using pikepdf in Python

Answers (0)

Related Questions