How to build html5lib parser to deal with a mixture of XML and HTML tags?

Question

I am trying to use BeautifulSoup to parse an HTML file consists of many individual documents downloaded as a batch from LexisNexis (legal database).

My first task is to split the HTML file into its constituent documents. I thought this would be easy since the documents are surrounded by body of the 1st document and so on.
However, this tag is an XML tag, not an HTML tag (all other tags in the file are HTML). Due to this, with the regular HTML parser, this tag is not available in the tree.
How can I build a parser in bs4 that will pick up this XML tag? I enclose the relevant section of the HTML file:

BODY

Answers (1)