Parsh
Parsh

Reputation: 347

How to remove text from a PDF and save them via code using Python

I;m using this library PYMUPDF (Documentation) that offers various functions to deal with PDF documents using python.

What I want to achieve: I would like to extract all the images (I cannot use typical methods as the images are not raster. They are vectors with machine-readable text hence I would like to display the PDF page with just the image) and it's labels (i.e. "Figure 1: XYZ") from a PDF document.

Where I am now: I am able to narrow down to the pages that contain images, convert the PDF page into an image and rename the file with it's labels.

I'm hoping if is was a way to remove all text from the page, then I could save the image file with just the image (and some white space, which should be fine)

Upvotes: 0

Views: 499

Answers (1)

Imran Habib
Imran Habib

Reputation: 61

I don't have any idea about python, but this is something that can easily by done using UniPDF. They have built-in code for many functions and you can customize the code based on your needs. See their examples at https://github.com/unidoc/unipdf-examples.

I am confident this will help you a lot.

Upvotes: 0

Related Questions