Reputation: 77
I tried to use urllib2 to fetch an zip file from a subtitle website.
The example website is http://sub.makedie.me and I tried to download this file http://sub.makedie.me/download/601943/Game%20of%20Thrones%20-%2005x08%20-%20Hardhome.KILLERS.English.HI.C.orig.Addic7ed.com.zip
I tested in my script and print the url. The url was fine. I copied and pasted in the web browser and I could download it successfully.
At first, the script looked like this:
try:
f = urllib2.urlopen(example_url)
f.read()
something...
except URLError, e:
print e.code
But I got 403 error code. After searching, I tried to change the header to {'User-Agent': 'Mozilla/5.0'}. The code was changed to:
try:
req = urllib2.Request(example_url,headers={'User-Agent': 'Mozilla/5.0'})
f = urllib2.urlopen(req)
something...
except URLError, e:
print e.code
Then I got 402 error. I am wondering is this because of the website setting or because the error in my code?
Upvotes: 0
Views: 1067
Reputation: 28405
I would try with:
urllib.urlretrieve(url, outname)
as you are trying to download the file rather than to open it.
Upvotes: 1
Reputation: 5537
402 Means the request isn't valid at the moment.
It is reserved for future use.
From http://en.wikipedia.org/wiki/List_of_HTTP_status_codes :
402 Payment Required
Reserved for future use. The original intention was that this code might be used as part of some form of digital cash or micropayment scheme, but that has not happened, and this code is not usually used. YouTube uses this status if a particular IP address has made excessive requests, and requires the person to enter a CAPTCHA.
Hence there might be a CAPTCHA involved which is causing the issue.
Check the Robots.txt file for the site: www.domain_name.com/robots.txt
Upvotes: 1