K. Michael
K. Michael

Reputation: 75

How do I get the HTML of a website using Python 3?

I've been trying to do this with repl.it and have tried several solutions on this site, but none of them work. Right now, my code looks like

import urllib
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
print (urllib.urlopen(url).read())

but it just says "AttributeError: module 'urllib' has no attribute 'urlopen'".

If I add import urllib.urlopen, it tells me there's no module named that. How can I fix my problem?

Upvotes: 0

Views: 9031

Answers (2)

ie__ll
ie__ll

Reputation: 21

Python3

import urllib
import requests
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)

or

import urllib.request
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)

Upvotes: 2

john_science
john_science

Reputation: 6561

The syntax you are using for the urllib library is from Python v2. The library has changed somewhat for Python v3. The new notation would look something more like:

import urllib.request
response = urllib.request.urlopen("http://www.google.com")
html = response.read()

The html object is just a string, with the returned HTML of the site. Much like the original urllib library, you should not expect images or other data files to be included in this returned object.

The confusing part here is that, in Python 3, this would fail if you did:

import urllib
response = urllib.request.urlopen("http://www.google.com")
html = response.read()

This strange module-importing behavior is, I am told, as intended and working. BUT it is non-intuitive and awkward. More importantly, for you, it makes the situation harder to debug. Enjoy.

Upvotes: 4

Related Questions