Reputation: 1209
In lxml
's doc, it says:
lxml can interface to the parsing capabilities of BeautifulSoup through the lxml.html.soupparser module. It provides three main functions: fromstring() and parse() to parse a string or file using BeautifulSoup into an lxml.html document, and convert_tree() to convert an existing BeautifulSoup tree into a list of top-level Elements.
Meanwhile, BS
' can also use lxml
as the parser.[ref]
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser.
BS
also suggests to use lxml
as the parser for speed.
So what if lxml
uses BS
for parsing when BS
's parser is lxml
conversely?
I have been scratching my head over understanding their relationship. Help.
Upvotes: 3
Views: 3173
Reputation: 89285
Nothing should be confusing about BS
parser and lxml.html
parser. BS
has an HTML parser, and lxml
has its own HTML parser.
BS
documentation you quoted simply says that you can parse HTML into BS
soup object using lxml
parser or other possible third-party parsers, as alternative to using the default BS
parser :
BeautifulSoup(markup, "lxml")
Similarly, the lxml
documentation says that you can parse HTML into lxml
tree object using BS
parser, as alternative to using the default lxml.html
parser :
root = lxml.html.soupparser.fromstring(tag_soup)
Upvotes: 6