Reputation: 238
How can I parse out the doctype tag to get the html version from a html file?
Trying to use doctype(or DOCTYPE or !DOCTYPE) as an argument in xpath raises an invalide expression error.
Upvotes: 4
Views: 1092
Reputation: 27793
The doctype is not part of the document, but part of its DTD
require 'rubygems'
require 'nokogiri'
html = <<EOF
<!DOCTYPE foo PUBLIC "bar" "qux">
<html>
</html>
EOF
doc = Nokogiri::HTML(html)
puts doc.internal_subset.name
puts doc.internal_subset.external_id
puts doc.internal_subset.system_id
Upvotes: 5