How to get HTML content of 404 error page using python?

Question

I am using python to get HTML data from multiple pages at a URL. I found that urllib throws an exception when a URL does not exist. How do I retrieve the HTML of that custom 404 error page (the page where it says something like "Page is not found.")

Current code:

try:
    req = Request(URL, headers={'User-Agent': 'Mozilla/5.0'})
    client = urlopen(req)

    #downloading html data
    page_html = client.read()

    #closing connection
    client.close()
except:
    print("The following URL was not found. Program terminated.
" + URL)
    break

Derwent · Accepted Answer

Have you tried the requests library?

Just install the library with pip

pip install requests

And use it like this

import requests

response = requests.get('https://stackoverflow.com/nonexistent_path')
print(response.status_code) # 404
print(response.text) # Prints the raw HTML response

How to get HTML content of 404 error page using python?

Answers (2)

Related Questions