user2054545
user2054545

Reputation: 161

How to build html5lib parser to deal with a mixture of XML and HTML tags?

I am trying to use BeautifulSoup to parse an HTML file consists of many individual documents downloaded as a batch from LexisNexis (legal database).

Upvotes: 5

Views: 218

Answers (1)

That1Guy
That1Guy

Reputation: 7233

You can specify xml in bs4 when your BeautifulSoup object is instantiated:

xml_soup = BeautifulSoup(xml_object, 'xml')

This should take care of your issue. You can use the xml_soup object to parse the remaining html, however I'd recommend instantiating another soup object specifically for html:

soup = BeautifulSoup(html_object)

Upvotes: 1

Related Questions