david
david

Reputation: 238

parsing out the html doctype tag in Nokogiri

How can I parse out the doctype tag to get the html version from a html file?

Trying to use doctype(or DOCTYPE or !DOCTYPE) as an argument in xpath raises an invalide expression error.

Upvotes: 4

Views: 1092

Answers (1)

akuhn
akuhn

Reputation: 27793

The doctype is not part of the document, but part of its DTD

require 'rubygems'
require 'nokogiri'

html = <<EOF
<!DOCTYPE foo PUBLIC "bar" "qux">
<html>
</html>
EOF

doc = Nokogiri::HTML(html)

puts doc.internal_subset.name
puts doc.internal_subset.external_id
puts doc.internal_subset.system_id

Upvotes: 5

Related Questions