Reputation: 695
I want to download a zip file using python.
With this type of url, http://server.com/file.zip this is quite simple by using urllib2.urlopen and writing it in a local file.
But in my case I have this type of url: http://server.com/customer/somedata/download?id=121&m=zip, the download is launched after a form validation.
It could be useful to precise that in my case I want to deploy it on heroku, so I can't use spynner that is built with C++. This download is launched after a scraping that uses scrapy.
From a browser the download works well, I get a good zip file with its name. Using python I just get html and header data...
Is there any way to get a file from this type of url in python ?
Upvotes: 3
Views: 1136
Reputation: 12168
This Site is serving JavaScript which then invokes the download. You have no choice but to: a) evaluate the JavaScript in a simulated Browser environment or b) parse manually what the JS does, and re-implement that in python. e.g. string extraction of the URL and download key, possibly invoking an AJAX request, and finally download the file
I generally recommend Mechanize for webpage related automation, but it cannot deal with JavaScript either, so I guess you can stick with Scrapy if you want to go for plan b).
Upvotes: 1
Reputation: 9913
When you do the download in the browser, open up the network tab of the developer console and record what HTTP method (probably POST), the POST parameters, the cookie, and everything else that is part of the validation; then use a library to replicate that.
Upvotes: 0