SBF12345
SBF12345

Reputation: 75

HTML Parsing w/ BS4: couldn't find a tree builder...'html parser'

I am having trouble understanding how to proceed after receiving an error in my pydev console.

the console is returning the following:

  b'<!DOCTYPE html>\n<html>\n    <head>\n        <title>A simple example page</title>\n    </head>\n    <body>\n        <p>Here is some simple content for this page.</p>\n    </body>\n</html>'
Traceback (most recent call last):
  File "C:\Users\RainShadow\eclipse-workspace\test0\test2.py", line 7, in <module>
    soup = BeautifulSoup(page.content, 'html parser')
  File "C:\Users\RainShadow\Desktop\PythonLibs\BeautifulSoup4\bs4\__init__.py", line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html parser. Do you need to install a parser library?

The code I ran to generate the above console output is below:

import requests 

page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
print(page.content)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html parser')

print(soup.prettify())

My question is where to best download a tree builder with the feature 'html parser'?

Upvotes: 1

Views: 2651

Answers (1)

Josh McMillan
Josh McMillan

Reputation: 734

Try this when initializing BS:

soup = BeautifulSoup(page.content, 'html.parser')

Note the period (.) there rather than a space. html.parser ships out of the box with Python and should parse the page to the level you need. See this documentation for more info.

Upvotes: 2

Related Questions