heyItsTy1992
heyItsTy1992

Reputation: 43

BeautifulSoup4 throwing an error in Python 3.x

i am trying to create a webpage scraper and I want to use BeautifulSoup to do so. I installed BeautifulSoup 4.3.2 as the website said it was compatible with python 3.x. I used

pip install beautifulsoup4

to install it. But when i run

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")

link = "http://" + url

data = requests.get(link).content

soup = BeautifulSoup(data)

for link in soup.find_all('a'):

   print(link.get('href'))

i get an error that says

Traceback (most recent call last):
File "/Users/user/Desktop/project.py", line 1, in <module>
  from bs4 import BeautifulSoup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages   /bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
  ImportError: cannot import name _htmlparser

Upvotes: 2

Views: 4948

Answers (3)

ACEnglish
ACEnglish

Reputation: 115

I just edited the bs4/builder/_htmlparser.py so that

A) HTMLParseError wasn't imported

from html.parser import HTMLParser

B) The HTMLParseError class was defined

class HTMLParseError(Exception):
    """Exception raised for all parse errors."""

    def __init__(self, msg, position=(None, None)):
        assert msg
        self.msg = msg
        self.lineno = position[0]
        self.offset = position[1]

    def __str__(self):
        result = self.msg
        if self.lineno is not None:
            result = result + ", at line %d" % self.lineno
        if self.offset is not None:
            result = result + ", column %d" % (self.offset + 1)
        return result

This probably isn't the best since HTMLParserError isn't going to be raised. But! Your exception will just be uncaught and is unhandled anyways.

Upvotes: 0

WGS
WGS

Reputation: 14169

Just installed Python 3.x on my end and tested the latest download of BS4. Didn't work. However, a fix can be found here: https://github.com/il-vladislav/BeautifulSoup4 (credits to GitHub user Il Vladislav, whoever you are).

Download the zip, overwrite the bs4 folder inside your BeautifulSoup download, then reinstall it via python setup.py install. Works now on my end, as you can see in the screenshot below where an error is evident before working completely.

Code:

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
   print(link.get('href'))

Screenshot:

enter image description here

Relevant SO topic found here, showing that BS4 is not totally compatible with Python 3.x yet (even after 2 years).

Upvotes: 1

metatoaster
metatoaster

Reputation: 18898

I think there might be an error in the source file, specifically here:

  File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
  from .. import _htmlparser

In my installation, line 308 of bs4/builder /__init__.py

  from . import _htmlparser

You could probably just fix it there and see if bs4 will successfully import. Not sure which version of bs4 you got installed, but mine is at 4.3.2, and the _htmlparser.py is also at bs4/builder

Upvotes: 1

Related Questions