Harutaka Kawamura
Harutaka Kawamura

Reputation: 119

How to download a PDF file on a web page in Python

I'm trying to download the PDF file in the link below in Python.

Link

I tried to download it, but couldn't open the file saved.
My PDF viewer gave "The format of source is not PDF."
Could someone tell me what is wrong?

import urllib2

def main():
    url = "https://www.osapublishing.org/view_article.cfm?gotourl=https%3A%2F%2Fwww%2Eosapublishing%2Eorg%2FDirectPDFAccess%2F42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099%2Foe-21-22-27371%2Epdf%3Fda%3D1%26id%3D274099%26seq%3D0%26mobile%3Dno&org="
    download_file("example", url)

def download_file(file_name, download_url):
    response = urllib2.urlopen(download_url)
    file = open(file_name + ".pdf", 'wb')
    file.write(response.read())
    file.close()
    print("Completed")

if __name__ == "__main__":
    main()

Upvotes: 2

Views: 3336

Answers (1)

Selcuk
Selcuk

Reputation: 59445

Your URL is not a link to a PDF but to an HTML frame that contains the PDF. Use the direct URL instead:

url = "http://www.osapublishing.org/DirectPDFAccess/42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099/oe-21-22-27371.pdf?da=1&id=274099&seq=0&mobile=no"

You can obtain the source of the PDF file by viewing the HTML source of your original link.

Upvotes: 2

Related Questions