Reputation: 119
I'm trying to download the PDF file in the link below in Python.
I tried to download it, but couldn't open the file saved.
My PDF viewer gave "The format of source is not PDF."
Could someone tell me what is wrong?
import urllib2
def main():
url = "https://www.osapublishing.org/view_article.cfm?gotourl=https%3A%2F%2Fwww%2Eosapublishing%2Eorg%2FDirectPDFAccess%2F42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099%2Foe-21-22-27371%2Epdf%3Fda%3D1%26id%3D274099%26seq%3D0%26mobile%3Dno&org="
download_file("example", url)
def download_file(file_name, download_url):
response = urllib2.urlopen(download_url)
file = open(file_name + ".pdf", 'wb')
file.write(response.read())
file.close()
print("Completed")
if __name__ == "__main__":
main()
Upvotes: 2
Views: 3336
Reputation: 59445
Your URL is not a link to a PDF but to an HTML frame that contains the PDF. Use the direct URL instead:
url = "http://www.osapublishing.org/DirectPDFAccess/42C574A0-ABB6-FD11-777A24C1C4C5ADEF_274099/oe-21-22-27371.pdf?da=1&id=274099&seq=0&mobile=no"
You can obtain the source of the PDF file by viewing the HTML source of your original link.
Upvotes: 2