Reputation: 33
I am using mechanize to automatically download some pdf documents from webpages. When there is a pdf icon on the page, I can do this to get the file:
b.find_link(text="PDF download")
req = b.click_link(text="PDF download")
b.open(req)
Then I just write it to a new file.
However, for some of the documents I need, there is no direct 'PDF download' link on the page. Instead I have to click a 'submit' button to make a "delivery request" for the document: after clicking this button, the download starts happening while I am taken to another page which says "delivery request in progress" and then, once the download has finished, " Your delivery request is complete".
I have tried using mechanize to click the submit button, and then save the file that downloads by doing this:
b.select_form(nr=0)
b.submit()
downloaded_file = b.response().read()
but this stores the html of the page I am redirected to, not the file that downloads.
How do I get the file that downloads after I click 'submit'?
Upvotes: 2
Views: 941
Reputation: 33
For anyone with a similar problem, I found a workaround: mechanize emulates a browser that doesn't have JavaScript so I turned that off on my browser too, then when I went to the download page I could see a link that said 'if the download hasn't already started, click here to download'. Then I could just get mechanize to find that link and follow it in the normal way- and write the response to a new file.
Upvotes: 1