Maxima
Maxima

Reputation: 352

convert pdf file pages to images - Wand

Beginner here:

My code runs fine when I use it for just one pdf but as soon as I add a for loop, the code still runs but it just converts the first page of the pdf in multipage pdfs instead of all.

For example, if my pdf is xyz.pdf with 2 pages it will convert both pages as jpg and output it separately. But as soon as I run my code for both pdf xyz and abc, it just converts the first page of both the pdfs.

What am I missing here?

from wand.image import Image as wi

for pdf_file in os.listdir(pdf_dir):                               
  if pdf_file.endswith(".pdf"):
   pdf = wi(filename= os.path.join(pdf_dir, pdf_file), resolution=300)
   pdfimage = pdf.convert("jpeg")
   i=1
   for img in pdfimage.sequence:
     page = wi(image=img)
     page.save(filename=os.path.join(pdf_dir, str(pdf_file[:-4] +".jpg")))
     i +=1

Upvotes: 0

Views: 671

Answers (2)

jiggy
jiggy

Reputation: 272

Although this is quite old, there is no correct answer here as to what was wrong with your code. Your filename for the saved page image is just the pdf filename with jpg for every page, so it get's overwritten all the time. It should work with something like

page.save(filename=os.path.join(pdf_dir, f"{pdf_file[:-4]}_page{i}_.jpg"))

Upvotes: 0

tsamaya
tsamaya

Reputation: 396

works for me with:

def convert_pdf(filename, output_path, resolution=150):
    all_pages = wi(filename=filename, resolution=resolution)
    for i, page in enumerate(all_pages.sequence):
        with wi(page) as img:
            image_filename = os.path.splitext(os.path.basename(filename))[0]
            image_filename = '{}-{}.jpg'.format(image_filename, i)
            image_filename = os.path.join(output_path, image_filename)

            img.save(filename=image_filename)


for pdf_file in os.listdir(pdf_dir):
    if pdf_file.endswith(".pdf"):
        convert_pdf(os.path.join(pdf_dir, pdf_file), pdf_dir)

Upvotes: 2

Related Questions