Reputation: 43
i am trying to create a webpage scraper and I want to use BeautifulSoup to do so. I installed BeautifulSoup 4.3.2 as the website said it was compatible with python 3.x. I used
pip install beautifulsoup4
to install it. But when i run
from bs4 import BeautifulSoup
import requests
url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
i get an error that says
Traceback (most recent call last):
File "/Users/user/Desktop/project.py", line 1, in <module>
from bs4 import BeautifulSoup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages /bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
ImportError: cannot import name _htmlparser
Upvotes: 2
Views: 4948
Reputation: 115
I just edited the bs4/builder/_htmlparser.py
so that
A) HTMLParseError wasn't imported
from html.parser import HTMLParser
B) The HTMLParseError class was defined
class HTMLParseError(Exception):
"""Exception raised for all parse errors."""
def __init__(self, msg, position=(None, None)):
assert msg
self.msg = msg
self.lineno = position[0]
self.offset = position[1]
def __str__(self):
result = self.msg
if self.lineno is not None:
result = result + ", at line %d" % self.lineno
if self.offset is not None:
result = result + ", column %d" % (self.offset + 1)
return result
This probably isn't the best since HTMLParserError isn't going to be raised. But! Your exception will just be uncaught and is unhandled anyways.
Upvotes: 0
Reputation: 14169
Just installed Python 3.x on my end and tested the latest download of BS4. Didn't work. However, a fix can be found here: https://github.com/il-vladislav/BeautifulSoup4 (credits to GitHub user Il Vladislav, whoever you are).
Download the zip, overwrite the bs4
folder inside your BeautifulSoup
download, then reinstall it via python setup.py install
. Works now on my end, as you can see in the screenshot below where an error is evident before working completely.
Code:
from bs4 import BeautifulSoup
import requests
url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Screenshot:
Relevant SO topic found here, showing that BS4 is not totally compatible with Python 3.x yet (even after 2 years).
Upvotes: 1
Reputation: 18898
I think there might be an error in the source file, specifically here:
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
In my installation, line 308 of bs4/builder /__init__.py
from . import _htmlparser
You could probably just fix it there and see if bs4 will successfully import. Not sure which version of bs4 you got installed, but mine is at 4.3.2, and the _htmlparser.py
is also at bs4/builder
Upvotes: 1