Reputation: 94
My python 3.4.4 code is:
import urllib.request
from bs4 import BeautifulSoup
from html.parser import HTMLParser
urls = 'file:///C:/Users/tarunuday/Documents/scrapdata/mech.html'
htmlfile = urllib.request.urlopen(urls)
soup = BeautifulSoup(htmlfile,html.parser)
I'm getting this error
Traceback (most recent call last):
File "C:\Python34\saved\scrapping\scrapping2.py", line 7, in <module>
soup = BeautifulSoup(htmlfile,html.parser)
NameError: name 'html' is not defined
Now I understand that HTMLParser is py2.x and html.parser is py3.x but how can I get this to work? The bs4 site says If you get the ImportError “No module named html.parser”, your problem is that you’re running the Python 3 version of the code under Python 2.
, but I'm running 3.x and I'm getting a NameError not an ImportError
Upvotes: 3
Views: 16954
Reputation: 318
In your code html.parser
is a string and if you are using python 3 or above, it needs quotes around it.
Upvotes: 1
Reputation: 599876
The error is correct, you haven't defined html
anywhere. The documentation you link to shows that you should be passing "html.parser"
as a string; it doesn't look like you need to import HTMLParser at all.
Upvotes: 5