Reputation: 1043
I am downloading a list of remote files. My code looks like the following:
try:
r = requests.get(url, stream=True, verify=False)
total_length = int(r.headers['Content-Length'])
if total_length:
with open(file_name, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
except (requests.RequestException, StandardError):
pass
My problem is that requests downloads plain HTML for files that do not exist (for example the 404 page, or other similar in nature HTML pages). Is there a way to circumvent this? Any header to check like Content-Type
perhaps?
Solution:
I used the r.raise_for_status()
function call as per the accepted answer and also added an extra check for Content-Type
like:
if r.headers['Content-Type'].split('/')[0] == "text":
#pass/raise here
(MIME types list here: http://www.freeformatter.com/mime-types-list.html)
Upvotes: 0
Views: 1232
Reputation: 1121814
Use r.raise_for_status()
to raise an exception for responses with 4xx and 5xx status codes, or test the r.status_code
explicitly.
r.raise_for_status()
raises an HTTPError
exception, which is a subclass of RequestException
which you already catch:
try:
r = requests.get(url, stream=True, verify=False)
r.raise_for_status() # raises if not a 2xx or 3xx response
total_length = int(r.headers['Content-Length'])
if total_length:
# etc.
except (requests.RequestException, StandardError):
pass
The r.status_code
check would let you narrow down what you consider a proper response code. Do note that 3xx
redirects are handled automatically, and you won't see other 3xx responses as requests
won't send conditional requests in this case, so there is little need for explicit tests here. But if you do, it'd look something like:
r = requests.get(url, stream=True, verify=False)
r.raise_for_status() # raises if not a 2xx or 3xx response
total_length = int(r.headers['Content-Length'])
if 200 <= r.status_code < 300 and total_length:
# etc.
Upvotes: 4