Reputation: 21249
I'm trying to use Nokogiri to parse and update some xhtml files (fixing image sizes).
The parsing and updating works well but when I save the document with:
doc.to_xhtml(:indent_text => "\t", :indent=>1, :encoding => 'UTF-8')
The first two lines change from (original):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
to (output):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<?xml version="1.0" encoding="utf-8"??>
which isn't a valid xml document (and there's also a double ?
at the end of the xml tag).
Am I doing some wrong?
Edit: I've got nokogiri (1.6.0)
installed, which seems to be the latest version.
Upvotes: 2
Views: 567
Reputation: 7561
This problem is an open (though very old) Nokogiri issue on Github, though it may in fact be a libxml
issue. I was able to replicate your output.
The quick fix is to parse your document with Nokogiri::XML
rather than Nokogiri::HTML
, which is probably better practice anyway when dealing with XHTML files:
doc = Nokogiri::XML(open 'wherever')
doc.to_xhtml(:indent_text => "\t", :indent=>1, :encoding => 'UTF-8')
Note that this won't preserve your XML processing instruction. If you need it, use to_xml
.
Upvotes: 2