Nokogiri to_xhtml puts doctype before

Question

I'm trying to use Nokogiri to parse and update some xhtml files (fixing image sizes).

The parsing and updating works well but when I save the document with:

doc.to_xhtml(:indent_text => "	", :indent=>1, :encoding => 'UTF-8')

The first two lines change from (original):

to (output):

which isn't a valid xml document (and there's also a double ? at the end of the xml tag).

Am I doing some wrong?

Edit: I've got nokogiri (1.6.0) installed, which seems to be the latest version.

Jacob Brown · Accepted Answer

This problem is an open (though very old) Nokogiri issue on Github, though it may in fact be a libxml issue. I was able to replicate your output.

The quick fix is to parse your document with Nokogiri::XML rather than Nokogiri::HTML, which is probably better practice anyway when dealing with XHTML files:

doc = Nokogiri::XML(open 'wherever')
doc.to_xhtml(:indent_text => "	", :indent=>1, :encoding => 'UTF-8')

Note that this won't preserve your XML processing instruction. If you need it, use to_xml.

Nokogiri to_xhtml puts doctype before <?xml

Answers (1)

Related Questions

Nokogiri to_xhtml puts doctype before &lt;?xml

Answers (1)

Related Questions

Nokogiri to_xhtml puts doctype before <?xml