Why is this code using PyMuPDF not extracting all the images in a PDF?

Question

I'm trying to extract images from an invoice for an equipment order and each time I run the code I only get 4 of 8 or 9 total photos on each page. Are there some PDFs that are just not compatible with some of PyMuPDF's functions?

def extract_images(model_nums, file):
    image_num = 0

    doc = fitz.open(file)

    # new directories that will hold images
    all_path = os.path.join(os.getcwd(), "All Files")
    if not os.path.exists('All Files'):
        os.mkdir(all_path)
    if not os.path.exists(sport_id):
        os.mkdir(sport_path)

    for i in range(doc.page_count):
        print("Page: "+ str(i))
    
        images = doc.get_page_images(i)

        for img in images:  
            xref = img[0]
            pix = fitz.Pixmap(doc, xref)
            pix.save(f"{all_path}/{model_nums[image_num]}.jpg")
            pix = None

            image_num += 1

I've tried even searching up different code from other people that will just count the number of images and came up with the same issue.

Why is this code using PyMuPDF not extracting all the images in a PDF?

Answers (0)

Related Questions