Reputation:
Is there a C++ code or library to convert a HTML document to a XML document? Thanks.
Upvotes: 6
Views: 6010
Reputation:
I wanted to convert to XML to parse it with libxml++, but I found this library: http://htmlcxx.sourceforge.net/ With it I can parse XML and HTML without any conversion.
Upvotes: 1
Reputation: 67254
If your XHTML is properly formed, then it is pretty much XML.
If you use any C++ xml parser you can load the document.. and hope it can parse it, then write it back out again.
Upvotes: 1
Reputation: 22210
You can take a look at Tidy library
Tidy is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML.
The library is written in C.
Upvotes: 5