GonzaloReig
GonzaloReig

Reputation: 87

Read PDF as a picture

I have some pdf, I want to read them as pictures to get all the pixels info.

So I tried first to convert the pdf into jpeg:

from pdf2image import convert_from_path
img = convert_from_path('mypdf.pdf')

This works. Now I am gonna try to get the pixel info, but I have an error:

import matplotlib.pyplot as plt
pixel_img = plt.imread(img[0])

TypeError: Object does not appear to be a 8-bit string path or a Python file-like object

I don´t understand it, as the plt.imread() seems to work when I use it to read an original .jpeg. The img is a PIL object, so shouldn´t it be a "python file-like object"?

I also tried to use the PIL package (as img as a PIL object), and tried to read with a different method (but all I get is another mistake):

from PIL import Image    
pixel_img = Image.open(img[0])

AttributeError: 'PpmImageFile' object has no attribute 'read'

This link is not exactly as I want, because just save the pdf as jpg. But I don´t want to save it, I just want to read it and get the pixel info.

Thanks

Upvotes: 4

Views: 6628

Answers (1)

Tankred
Tankred

Reputation: 316

convert_from_path returns a list of PIL images, so you must not treat them as files.

The following converts the pages of a PDF to PIL images, converts the first page/image to a numpy array (for easy access to pixels) and gets the pixel at position y=10, x=15:

from pdf2image import convert_from_path
import numpy as np

images = convert_from_path('test.pdf')

# to numpy array
image = np.array(images[0])

# get pixel at position y=10, x=15
# where pix is an array of R, G, B.
# e.g. pix[0] is the red part of the pixel
pix = image[10,15]

Upvotes: 4

Related Questions