converting image-based pdf to image file (png/jpg) in python

Question

I want to covert image-based PDF to image(.png/.jpg) file in Python, so I can further use this image for exacting tabular data form it. I don not want to run the code from command line.

I am currently using Python 3.7.1 version and Pycharm IDE.

I have tried the code provided on stackoverflow but nothing works, it runs but unable to extract image form image-based PDF file. Below is the link for it. Extracting images from pdf using Python

Also, tried the code from dzone.com, below is the link but nothing works https://dzone.com/articles/exporting-data-from-pdfs-with-python

Below are the image-based PDF file links:

link1: https://www.molex.com/pdm_docs/sd/190390001_sd.pdf

link2: https://www.te.com/commerce/DocumentDelivery/DDEController?Action=showdoc&DocId=Customer+Drawing%7FDT04-12PX-C015%7F-%7Fpdf%7FEnglish%7FENG_CD_DT04-12PX-C015_-.pdf%7FDT04-12PA-C015

Please suggest any solution for this.

Kuldeep Singh Sidhu · Accepted Answer

The pdf2image library converts pdf to images. As looking at your pdfs they are just images nothing else, you can convert the page to image

Install

pip install pdf2image

Once installed you can use following code to get images.

from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)

# Saving pages in jpeg format

for page in pages:
    page.save('out.jpg', 'JPEG')

converting image-based pdf to image file (png/jpg) in python

Answers (1)

Related Questions