pR0Ps
pR0Ps

Reputation: 2792

Lax HTML parsing in C++?

I'm looking for a solution for parsing potentially malformed HTML in C++, similar to what Beautiful Soup does in Python.

Normally, just using an XML parser would work, but the specific HTML in this case isn't valid XML/XHTML and can't be properly parsed.

Do libraries/tools for this exist?

Upvotes: 4

Views: 746

Answers (3)

Eugen Constantin Dinca
Eugen Constantin Dinca

Reputation: 9140

According to the documentation LibXml2 is capable of parsing HTML4.

Upvotes: 2

imaximchuk
imaximchuk

Reputation: 748

You can use HTMLTidy to transform HTML into valid XML and then use any C++ XML parser availiable

Upvotes: 6

Gnu Engineer
Gnu Engineer

Reputation: 1545

I've used Xerces and recommend it for C++. It has both DOM and SAX model.

Upvotes: -1

Related Questions