Reputation: 6474
I want to determine the version of HTML of a web page. How do I do this in a Google App Engine Java application? (Or even a desktop java application?)
Upvotes: 3
Views: 7466
Reputation: 98816
As the comments have mentioned, there isn’t much of a hard-and-fast difference between an “HTML5” HTML page and an “older” HTML page. It’s all HTML. Much of the point of HTML5 as a standard is to document how browsers already treat HTML, rather than specify new stuff (aside from tags with different names, and JavaScript APIs).
If a page uses the HTML5 doctype (<!DOCTYPE html>
), that’s a pretty good indication that the author intended it to be HTML5. But as the comments have mentioned, you just need a decent HTML parser — it’ll suck up older HTML and HTML5 alike, because they’re effectively the same thing as far as parsing goes.
I’ve very little experience with HTML parsers, but as robertc suggested in his comment, you might try http://about.validator.nu/htmlparser/.
Upvotes: 6