TW1
TW1

Reputation: 33

How to get the pdf file that downloads when I click 'submit' which also redirects me to new page

I am using mechanize to automatically download some pdf documents from webpages. When there is a pdf icon on the page, I can do this to get the file:

    b.find_link(text="PDF download")
    req = b.click_link(text="PDF download")
    b.open(req)

Then I just write it to a new file.

However, for some of the documents I need, there is no direct 'PDF download' link on the page. Instead I have to click a 'submit' button to make a "delivery request" for the document: after clicking this button, the download starts happening while I am taken to another page which says "delivery request in progress" and then, once the download has finished, " Your delivery request is complete".

I have tried using mechanize to click the submit button, and then save the file that downloads by doing this:

b.select_form(nr=0)
b.submit()
downloaded_file = b.response().read()

but this stores the html of the page I am redirected to, not the file that downloads.

How do I get the file that downloads after I click 'submit'?

Upvotes: 2

Views: 941

Answers (1)

TW1
TW1

Reputation: 33

For anyone with a similar problem, I found a workaround: mechanize emulates a browser that doesn't have JavaScript so I turned that off on my browser too, then when I went to the download page I could see a link that said 'if the download hasn't already started, click here to download'. Then I could just get mechanize to find that link and follow it in the normal way- and write the response to a new file.

Upvotes: 1

Related Questions