Reputation: 53
I am trying to do lossless PNG compression on images in PDFs using Pillow. Here is some of my code that accesses the image xobjects and tries to use them to create a PIL.Image object
import io
import pikepdf
from PIL import Image
with pikepdf.open("./doc.pdf") as pdf:
for page in pdf.pages:
for image_key, image_data in page.images.items():
raw_data_stream = image_data.get_raw_stream_buffer()
img_data_io = io.BytesIO(raw_data_stream)
img_data_io.seek(0)
img = Image.open(img_data_io)
This gives me a PIL.UnidentifiedImageError: cannot identify image file
I've tried changing it to
img = Image.open(img_data_io.read())
But this gives a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 1: invalid continuation byte
. I've tried this on 25 different pdfs, and they have a different problematic byte (e.g., 0x83), but they all throw this error.
This is the contents of image_data:
<pikepdf.Stream(owner=<...>, data=<...>, {
"/BitsPerComponent": 4,
"/ColorSpace": [ "/Indexed", [ "/ICCBased", pikepdf.Stream(owner=<...>, data=<...>, {
"/Alternate": "/DeviceRGB",
"/Filter": "/FlateDecode",
"/Length": 2598,
"/N": 3
}) ], 15, pikepdf.Stream(owner=<...>, data=<...>, {
"/Length": 49
}) ],
"/Filter": "/FlateDecode",
"/Height": 326,
"/Length": 28607,
"/Subtype": "/Image",
"/Type": "/XObject",
"/Width": 1455
})>
How can I create a PIL.Image object from a such an XObject pulled from a PDF?
Upvotes: 0
Views: 71