Thomas
Thomas

Reputation: 6196

Can you parse HTML with an XML parser?

I'm looking to get a C++ parser for html but there seems to be only xml parsers for c++ and various sources allude to a fact that XML parsers can parse HTML but I can't find any concrete information that XML parses are acceptable to parse HTML.

If you can parse HTML with it, why is this possible if they're different languages, and I don't think html is a subset of XML?

Upvotes: 1

Views: 128

Answers (1)

kjhughes
kjhughes

Reputation: 111491

Some HTML can be parsed with an XML parser; some HTML cannot.

SGML begat both XML and HTML. SGML and HTML do not universally require closing tags as XML does (among other differences) and therefore cannot be parsed via XML parsers in the general case. On the other hand, XHTML is by definition well-formed XML and therefore can be parsed via XML parsers.

Upvotes: 2

Related Questions