How to download a file pushed to a browser using python?

Question

I want to download a zip file using python.

With this type of url, http://server.com/file.zip this is quite simple by using urllib2.urlopen and writing it in a local file.

But in my case I have this type of url: http://server.com/customer/somedata/download?id=121&m=zip, the download is launched after a form validation.

It could be useful to precise that in my case I want to deploy it on heroku, so I can't use spynner that is built with C++. This download is launched after a scraping that uses scrapy.

From a browser the download works well, I get a good zip file with its name. Using python I just get html and header data...

Is there any way to get a file from this type of url in python ?

ch3ka · Accepted Answer

This Site is serving JavaScript which then invokes the download. You have no choice but to: a) evaluate the JavaScript in a simulated Browser environment or b) parse manually what the JS does, and re-implement that in python. e.g. string extraction of the URL and download key, possibly invoking an AJAX request, and finally download the file

I generally recommend Mechanize for webpage related automation, but it cannot deal with JavaScript either, so I guess you can stick with Scrapy if you want to go for plan b).

How to download a file pushed to a browser using python?

Answers (2)

Related Questions