Corentin Geoffray
Corentin Geoffray

Reputation: 695

How to download a file pushed to a browser using python?

I want to download a zip file using python.

With this type of url, http://server.com/file.zip this is quite simple by using urllib2.urlopen and writing it in a local file.

But in my case I have this type of url: http://server.com/customer/somedata/download?id=121&m=zip, the download is launched after a form validation.

It could be useful to precise that in my case I want to deploy it on heroku, so I can't use spynner that is built with C++. This download is launched after a scraping that uses scrapy.

From a browser the download works well, I get a good zip file with its name. Using python I just get html and header data...

Is there any way to get a file from this type of url in python ?

Upvotes: 3

Views: 1136

Answers (2)

ch3ka
ch3ka

Reputation: 12168

This Site is serving JavaScript which then invokes the download. You have no choice but to: a) evaluate the JavaScript in a simulated Browser environment or b) parse manually what the JS does, and re-implement that in python. e.g. string extraction of the URL and download key, possibly invoking an AJAX request, and finally download the file

I generally recommend Mechanize for webpage related automation, but it cannot deal with JavaScript either, so I guess you can stick with Scrapy if you want to go for plan b).

Upvotes: 1

forivall
forivall

Reputation: 9913

When you do the download in the browser, open up the network tab of the developer console and record what HTTP method (probably POST), the POST parameters, the cookie, and everything else that is part of the validation; then use a library to replicate that.

Upvotes: 0

Related Questions