Reputation: 1168
I'd like to turn a multipage PDF document into a series of image object in list structure, without saving the images in disk (I'd like to process them with PIL Image)in Python. So far I can only do this to write the images into files first:
from wand.image import Image
with Image(filename='source.pdf') as img:
with img.convert('png') as converted:
converted.save(filename='pyout/page.png')
But how could I turn the img objects above directly into list of PIL.Image objects?
Upvotes: 12
Views: 38691
Reputation: 1689
Download Poppler from here https://blog.alivate.com.au/poppler-windows/ , then use the following code:
from pdf2image import convert_from_path
file_name = 'A019'
images = convert_from_path(r'D:\{}.pdf'.format(file_name), poppler_path=r'C:\poppler-0.68.0\bin')
for i, im in enumerate(images):
im.save(r'D:\{}-{}.jpg'.format(file_name,i))
If you get an error because of poppler's path, add poppler's bin path to "Path" in windows environment variables. Path can be like this "C:\poppler-0.68.0\bin"
Upvotes: -1
Reputation: 11
my answer with wand is the following:
from wand.image import Image as wi
...
Data = filedialog.askopenfilename(initialdir="/", title="Choose File", filetypes = (("Portable Document Format","*.pdf"),("All Files", "*.*")))
apps.append(Data)
print(Data)
PDFfile = wi(filename = Data, resolution = 300)
Images = PDFfile.convert('tiff')
ImageSequence = 1
for img in PDFfile.sequence:
image = wi(image = img)
image.save(filename = "Document_300"+"_"+str(ImageSequence)+".tiff")
ImageSequence += 1
Hopefully this will help you.
I've implemented it with a GUI where you can simply choose your file.
You can also change the PDFfile.convert() in jpg etc.
Upvotes: 1
Reputation: 180
Simple way is to save image files and delete them after reading them using PIL.
I recommend to use pdf2image package. Before using pdf2image package, you might need to install poppler package via anaconda
conda install -c conda-forge poppler
If you are stuck, please update conda before installing :
conda update conda
conda update anaconda
After installing poppler, install pdf2image via pip :
pip install pdf2image
Then run this code :
from pdf2image import convert_from_path
dpi = 500 # dots per inch
pdf_file = 'work.pdf'
pages = convert_from_path(pdf_file ,dpi )
for i in range(len(pages)):
page = pages[i]
page.save('output_{}.jpg'.format(i), 'JPEG')
After this, please read them using PIL and delete them.
Upvotes: 5
Reputation: 1739
pip install pdf2image
from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path('/path/to/my.pdf')
You may need to install pillow as well. This might only work on linux.
https://github.com/Belval/pdf2image
Results may be different between the two methods.
Python 3.4:
from PIL import Image
from wand.image import Image as wimage
import os
import io
if __name__ == "__main__":
filepath = "fill this in"
assert os.path.exists(filepath)
page_images = []
with wimage(filename=filepath, resolution=200) as img:
for page_wand_image_seq in img.sequence:
page_wand_image = wimage(page_wand_image_seq)
page_jpeg_bytes = page_wand_image.make_blob(format="jpeg")
page_jpeg_data = io.BytesIO(page_jpeg_bytes)
page_image = Image.open(page_jpeg_data)
page_images.append(page_image)
Lastly, you can make a system call to mogrify, but that can be more complicated as you need to manage temporary files.
Upvotes: 8