Automate downloading embedded PDF files, using Python

Question

I am writing a Python script to automate downloading some pdf pages (from public domain work) hosted at a website. Unfortunately the individual pdf pages are embedded in frames, and when I used the following:

import time, urllib
for n in range(21,63):
    time.sleep(2)
    pdfPath="http://babel.hathitrust.org/cgi/imgsrv/download/pdf?id=wu.89038803698;orient=0;size=100;seq=%s;attachment=0"%(str(n))
    pdfName="Housner_"+str(n)+".pdf"
    f = open(pdfName, 'w')
    f.write(urllib.urlopen(pdfPath).read())
    f.close()
    time.sleep(2)

the files downlaoded were actually blank, and Adobe shows error, e.g. invalid image, embedded fonts etc. not found.

Can anyone kindly suggest me how to improve this script so that the PDFs downloaded are not errorneous/corrupt.

Thanks.

leongold · Accepted Answer

Replace 'w' with 'wb' in open(pdfName, 'w')

Automate downloading embedded PDF files, using Python

Answers (2)

Related Questions