nn0p
nn0p

Reputation: 1209

What's the relationship between 'BeautifulSoup' and 'lxml'?

In lxml's doc, it says:

lxml can interface to the parsing capabilities of BeautifulSoup through the lxml.html.soupparser module. It provides three main functions: fromstring() and parse() to parse a string or file using BeautifulSoup into an lxml.html document, and convert_tree() to convert an existing BeautifulSoup tree into a list of top-level Elements.

Meanwhile, BS' can also use lxml as the parser.[ref]

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser.

BS also suggests to use lxml as the parser for speed.

So what if lxml uses BS for parsing when BS's parser is lxml conversely?

I have been scratching my head over understanding their relationship. Help.

Upvotes: 3

Views: 3173

Answers (1)

har07
har07

Reputation: 89285

Nothing should be confusing about BS parser and lxml.html parser. BS has an HTML parser, and lxml has its own HTML parser.

BS documentation you quoted simply says that you can parse HTML into BS soup object using lxml parser or other possible third-party parsers, as alternative to using the default BS parser :

BeautifulSoup(markup, "lxml")

Similarly, the lxml documentation says that you can parse HTML into lxml tree object using BS parser, as alternative to using the default lxml.html parser :

root = lxml.html.soupparser.fromstring(tag_soup)

Upvotes: 6

Related Questions