user3771782
user3771782

Reputation: 233

Convert xml to hash using Nokogiri but keep the anchor tags

I have xml file like this below. I want to parse the convert it to a ruby hash. I tried doing it this way:

But it strips out the anchor tags and I end up with description something like this. "Today is a "

How can I convert the xml to a hash but keep the anchor tags?

Code:

@doc          = File.open(xml_file) { |f| Nokogiri::XML(f) }
data          = Hash.from_xml(@doc.to_s)

XML FILE

<blah>
  <tag>
   <name>My Name</name>
   <url>www.url.com</url>
   <file>myfile.zip</file>
   <description>Today is a <a href="www.sunny.com">sunny</a></description>
 </tag>
   <tag>
   <name>Someones Name</name>
   <url>www.url2.com</url>
   <file>myfile2.zip</file>
   <description>Today is a <a href="www.rainy.com">rainy</a></description>
 </tag>
</blah>

Upvotes: 1

Views: 608

Answers (1)

chumakoff
chumakoff

Reputation: 7044

The only way I see now is to escape HTML inside <description> in the whole document, then execute Hash#from_xml:

doc = File.open(xml_file) { |f| Nokogiri::XML(f) }

# escape HTML inside <description>
doc.css("description").each do |node|
  node.inner_html = CGI.escapeHTML(node.inner_html)
end

data = Hash.from_xml(doc.to_s) # => 

# {"blah"=>
#   {
#     "tag"=>[
#       {
#         "name"=>"My Name", 
#         "url"=>"www.url.com", 
#         "file"=>"myfile.zip", 
#         "description"=>"Today is a <a href=\"www.sunny.com\">sunny</a>"
#       }, 
#       {
#         "name"=>"Someones Name", 
#         "url"=>"www.url2.com", 
#         "file"=>"myfile2.zip", 
#         "description"=>"Today is a <a href=\"www.rainy.com\">rainy</a>"
#       }
#     ]
#   }
# }

Nokogiri is used here just for HTML escaping. You don't really need it if you find some another way to escape. For example:

xml = File.open(xml_file).read

# escaping inner HTML (maybe not the best way, just example)
xml.gsub!(/<description>(.*)<\/description>/, "<description>#{CGI.escapeHTML($1)}</description>")

data = Hash.from_xml(doc.to_s)

Upvotes: 1

Related Questions