Replacing Images with Image Names instead in Pdf using pymupdf

Question

Using PyMuPDF, I want to extract all images from pdf and save them separately and replace all images in pdf with just their image names at the same image place and save as another document. I can save all images with following code.

import fitz
#This creates the Document object doc
doc = fitz.open("Article_Example_1_2.pdf")
html_text=""
for i in range(len(doc)):
    print(doc[i]._getContents())
    for img in doc.getPageImageList(i):
        xref = img[0]
        pix = fitz.Pixmap(doc, xref)
        if pix.n - pix.alpha < 4:       # this is GRAY or RGB   or pix.n < 5
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:               # CMYK: convert to RGB first
            pix1 = fitz.Pixmap(fitz.csRGB, pix)
            pix1.writePNG("p%s-%s.png" % (i, xref))
            pix1 = None
        pix = None

doc.save(filename=r"new.pdf")

doc.close()

but not sure how to replace them all in pdf with their stored images names. Would greatly appreciate if anyone can help me out here.

Replacing Images with Image Names instead in Pdf using pymupdf

Answers (1)

Related Questions