Reputation: 3885
I found a way to convert PDF file to JPG, actually to extract image files from PDF file. I have managed to do that with PyMuPDF
lib.
This is the documentation for that lib:
https://pymupdf.readthedocs.io/en/latest/
I have seen this code:
Extract images from PDF without resampling, in python?
and this code:
https://www.thepythoncode.com/article/extract-pdf-images-in-python
I wrote a code, that does not give me any errors, this is the code:
import fitz
import cv2
import numpy as np
doc = fitz.open("sample15.pdf")
#print(doc)
my_images = []
for i in range(len(doc)):
for img in doc.getPageImageList(i):
xref = img[0]
img = doc.extractImage(xref)
img = img["image"]
nparr = np.frombuffer(img, np.uint8)
img_np = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
my_images.append(img_np)
As you can see, I do not have print function anywhere, but my program prints this:
mupdf: expected object number #this is printed red
xref 9 image type jpeg
xref 12 image type jpeg
xref 15 image type jpeg
xref 18 image type jpeg
xref 21 image type jpeg
xref 24 image type jpeg
Why do I get this print output, how can I remove it? I guess that Its coming from the lib, but I do not know how to stop it
Upvotes: 0
Views: 536
Reputation: 168863
That output probably comes from one of the libraries you're using. You could look in their docs to figure out if there's a logging level option, or as a last-ditch "fix", use the contextlib.redirect_stdout
(and .redirect_stderr
) context managers to hide the output.
Upvotes: 2