Asia Vassos
Asia Vassos

Reputation: 1

Why is this code using PyMuPDF not extracting all the images in a PDF?

I'm trying to extract images from an invoice for an equipment order and each time I run the code I only get 4 of 8 or 9 total photos on each page. Are there some PDFs that are just not compatible with some of PyMuPDF's functions?

def extract_images(model_nums, file):
    image_num = 0

    doc = fitz.open(file)

    # new directories that will hold images
    all_path = os.path.join(os.getcwd(), "All Files")
    if not os.path.exists('All Files'):
        os.mkdir(all_path)
    if not os.path.exists(sport_id):
        os.mkdir(sport_path)

    for i in range(doc.page_count):
        print("Page: "+ str(i))
    
        images = doc.get_page_images(i)

        for img in images:  
            xref = img[0]
            pix = fitz.Pixmap(doc, xref)
            pix.save(f"{all_path}/{model_nums[image_num]}.jpg")
            pix = None

            image_num += 1

I've tried even searching up different code from other people that will just count the number of images and came up with the same issue.

Upvotes: 0

Views: 987

Answers (0)

Related Questions