Reputation: 87
I have some pdf, I want to read them as pictures to get all the pixels info.
So I tried first to convert the pdf into jpeg:
from pdf2image import convert_from_path
img = convert_from_path('mypdf.pdf')
This works. Now I am gonna try to get the pixel info, but I have an error:
import matplotlib.pyplot as plt
pixel_img = plt.imread(img[0])
TypeError: Object does not appear to be a 8-bit string path or a Python file-like object
I don´t understand it, as the plt.imread() seems to work when I use it to read an original .jpeg. The img is a PIL object, so shouldn´t it be a "python file-like object"?
I also tried to use the PIL package (as img as a PIL object), and tried to read with a different method (but all I get is another mistake):
from PIL import Image
pixel_img = Image.open(img[0])
AttributeError: 'PpmImageFile' object has no attribute 'read'
This link is not exactly as I want, because just save the pdf as jpg. But I don´t want to save it, I just want to read it and get the pixel info.
Thanks
Upvotes: 4
Views: 6628
Reputation: 316
convert_from_path
returns a list of PIL images, so you must not treat them as files.
The following converts the pages of a PDF to PIL images, converts the first page/image to a numpy array (for easy access to pixels) and gets the pixel at position y=10, x=15:
from pdf2image import convert_from_path
import numpy as np
images = convert_from_path('test.pdf')
# to numpy array
image = np.array(images[0])
# get pixel at position y=10, x=15
# where pix is an array of R, G, B.
# e.g. pix[0] is the red part of the pixel
pix = image[10,15]
Upvotes: 4