Reputation: 501
I'm using Nokogiri to process fragments of XHTML documents, and am running into some behavior I cannot explain or workaround. I'm not sure if it's a bug, or something I don't understand.
Consider the following two lines, showcasing a reduced version of the problem I'm running into:
puts Nokogiri::XML::DocumentFragment.parse(" <pre><div>foo</div></pre>")
puts Nokogiri::XML::DocumentFragment.parse("<pre><div>foo</div></pre>")
This is the output:
<pre>div>foo/div></pre>
<pre><div>foo</div></pre>
The second line is what I expect, but the first one puzzles me. Where did the
go? Why does its presence cause the <
to disappear?
Upvotes: 1
Views: 352
Reputation: 501
Based on matt's suggestion, I'm parsing the fragment by wrapping it in a full XHTML file, as that allows Nokogiri to know about the XHTML entities.
fragment = " <pre><div>foo</div></pre>"
head = <<HERE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta charset="UTF-8" />
</head>
<body>
HERE
foot = <<HERE
</body>
</html>
HERE
puts Nokogiri::XML.parse( head + fragment + foot).css("body").children.to_xml
Feels a bit heavy handed, but it works.
Upvotes: 1