Using PDFTron in Python, remove all image elements from a PDF with given size characteristics

Question

I'm trying to remove a large number of very small images from a series of PDF documents using the awesome looking PDFTron library for Python. Basically I want to create a new PDF by going over each element in an existing PDF file and copying the ones that meet a certain size criteria to the new PDF in the same position.

Can someone guide me to PDFTron documentation specifically for Python to help me accomplish this? Or provide a sample script that checks for image size? I think I can do the rest (emphasis on think). The documentation available on the PDFTron website is not specifically for Python, hard to look up what I need...

user3609640 · Accepted Answer

You can see from the ElementEdit sample how to remove all images from a document:

http://www.pdftron.com/pdfnet/samplecode.html#ElementEdit

Or provide a sample script that checks for image size?

Could you clarify what you mean by "image size"? If you mean the image's dimensions as displayed in the PDF page, you can check that using Element.GetBBox. If you mean the dimensions of the original image, you could check that using Element.GetImageWidth and Element.GetImageHeight (see http://www.pdftron.com/pdfnet/samplecode.html#ImageExtract). Also, Image.GetImageDataSize gives you the size of the image data in bytes.

Using PDFTron in Python, remove all image elements from a PDF with given size characteristics

Answers (1)

Related Questions