Tarun Uday
Tarun Uday

Reputation: 94

name error 'html' not defined with beautifulsoup4

My python 3.4.4 code is:

import urllib.request
from bs4 import BeautifulSoup
from html.parser import HTMLParser

urls = 'file:///C:/Users/tarunuday/Documents/scrapdata/mech.html'
htmlfile = urllib.request.urlopen(urls)
soup = BeautifulSoup(htmlfile,html.parser)

I'm getting this error

Traceback (most recent call last):
    File "C:\Python34\saved\scrapping\scrapping2.py", line 7, in <module>
    soup = BeautifulSoup(htmlfile,html.parser)
    NameError: name 'html' is not defined

Now I understand that HTMLParser is py2.x and html.parser is py3.x but how can I get this to work? The bs4 site says If you get the ImportError “No module named html.parser”, your problem is that you’re running the Python 3 version of the code under Python 2., but I'm running 3.x and I'm getting a NameError not an ImportError

Upvotes: 3

Views: 16954

Answers (2)

Almuntasir Abir
Almuntasir Abir

Reputation: 318

In your code html.parser is a string and if you are using python 3 or above, it needs quotes around it.

Upvotes: 1

Daniel Roseman
Daniel Roseman

Reputation: 599876

The error is correct, you haven't defined html anywhere. The documentation you link to shows that you should be passing "html.parser" as a string; it doesn't look like you need to import HTMLParser at all.

Upvotes: 5

Related Questions