How to remove text from a PDF and save them via code using Python

Question

I;m using this library PYMUPDF (Documentation) that offers various functions to deal with PDF documents using python.

What I want to achieve: I would like to extract all the images (I cannot use typical methods as the images are not raster. They are vectors with machine-readable text hence I would like to display the PDF page with just the image) and it's labels (i.e. "Figure 1: XYZ") from a PDF document.

Where I am now: I am able to narrow down to the pages that contain images, convert the PDF page into an image and rename the file with it's labels.

I'm hoping if is was a way to remove all text from the page, then I could save the image file with just the image (and some white space, which should be fine)

How to remove text from a PDF and save them via code using Python

Answers (1)

Related Questions