How to download a PDF file on a web page in Python

Question

I'm trying to download the PDF file in the link below in Python.

I tried to download it, but couldn't open the file saved.
My PDF viewer gave "The format of source is not PDF."
Could someone tell me what is wrong?

import urllib2

def main():
    url = "https://www.osapublishing.org/view_article.cfm?gotourl=https%3A%2F%2Fwww%2Eosapublishing%2Eorg%2FDirectPDFAccess%2F42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099%2Foe-21-22-27371%2Epdf%3Fda%3D1%26id%3D274099%26seq%3D0%26mobile%3Dno&org="
    download_file("example", url)

def download_file(file_name, download_url):
    response = urllib2.urlopen(download_url)
    file = open(file_name + ".pdf", 'wb')
    file.write(response.read())
    file.close()
    print("Completed")

if __name__ == "__main__":
    main()

Selcuk · Accepted Answer

Your URL is not a link to a PDF but to an HTML frame that contains the PDF. Use the direct URL instead:

url = "http://www.osapublishing.org/DirectPDFAccess/42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099/oe-21-22-27371.pdf?da=1&id=274099&seq=0&mobile=no"

You can obtain the source of the PDF file by viewing the HTML source of your original link.

How to download a PDF file on a web page in Python

Answers (1)

Related Questions