shurup
shurup

Reputation: 871

How to get HTML content of 404 error page using python?

I am using python to get HTML data from multiple pages at a URL. I found that urllib throws an exception when a URL does not exist. How do I retrieve the HTML of that custom 404 error page (the page where it says something like "Page is not found.")

Current code:

try:
    req = Request(URL, headers={'User-Agent': 'Mozilla/5.0'})
    client = urlopen(req)

    #downloading html data
    page_html = client.read()

    #closing connection
    client.close()
except:
    print("The following URL was not found. Program terminated.\n" + URL)
    break

Upvotes: 0

Views: 1508

Answers (2)

Malady
Malady

Reputation: 263

To preserve the comment that also answers the question, and also because it's what I was looking for, a way to do this without going outside :

By t.m.adam at Nov 4, 2018 at 10:07

See HTTPError. It has a .read() method which returns the response content. –

Upvotes: 0

Derwent
Derwent

Reputation: 625

Have you tried the requests library?

Just install the library with pip

pip install requests

And use it like this

import requests

response = requests.get('https://stackoverflow.com/nonexistent_path')
print(response.status_code) # 404
print(response.text) # Prints the raw HTML response

Upvotes: 2

Related Questions