Reputation: 119
I want to covert image-based PDF to image(.png/.jpg) file in Python, so I can further use this image for exacting tabular data form it. I don not want to run the code from command line.
I am currently using Python 3.7.1 version and Pycharm IDE.
I have tried the code provided on stackoverflow but nothing works, it runs but unable to extract image form image-based PDF file. Below is the link for it. Extracting images from pdf using Python
Also, tried the code from dzone.com, below is the link but nothing works https://dzone.com/articles/exporting-data-from-pdfs-with-python
Below are the image-based PDF file links:
link1: https://www.molex.com/pdm_docs/sd/190390001_sd.pdf
Please suggest any solution for this.
Upvotes: 1
Views: 5455
Reputation: 3856
The pdf2image
library converts pdf to images. As looking at your pdfs they are just images nothing else, you can convert the page to image
Install
pip install pdf2image
Once installed you can use following code to get images.
from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)
# Saving pages in jpeg format
for page in pages:
page.save('out.jpg', 'JPEG')
Upvotes: 4