genkilabs
genkilabs

Reputation: 2986

Convert Ruby string with ampersand-hash-char-semicolon characters into an ascii or html friendly string

Using Rails 3 I am consuming an XML feed generated in drupal or something. The tags it gives me look like:

<body><![CDATA[&#60;p&#62;This is a title&#60;br /&#62;A subheading&#60;/p&#62;]]></body>

So the intention is that this should really look like:

<p>This is a title<br />A subheading</p>

Which could then be rendered in a view using <%= @mystring.html_safe %> or <%= raw @mystring %> or something. The trouble is that rendering the string in this way will simply convert substrings like &#60; into the < character. I need a sort of double raw or double unencode to first deal with the chr and then render the tags as html safe.

Anyone know of anything like:

<%= @my_double_safed_string.html_safe.html_safe %>

Upvotes: 4

Views: 1648

Answers (1)

Blixxy
Blixxy

Reputation: 706

I don't think this is valid XML - they've sort of escaped the text twice in two different ways, by using entities and cdata. Still, you can parse it using nokogiri for example:

require 'nokogiri'

xml = Nokogiri::XML.parse "<body><![CDATA[&#60;p&#62;This is a title&#60;br /&#62;A subheading&#60;/p&#62;]]></body>"
text = Nokogiri::XML.parse("<e>#{xml.text}</e>").text
#=> text = "<p>This is a title<br />A subheading</p>"

Seeing as this drupal site is spewing crazy double escaped xml, I'd be inclined to even use a regexp. Hacks to solve a problem hacks created? IDK. Regardless:

xml.text
#=> "&#60;p&#62;This is a title&#60;br /&#62;A subheading&#60;/p&#62;"
xml.text.gsub(/\&\#([0-9]+);/) { |i| $1.to_i.chr }
#=> "<p>This is a title<br />A subheading</p>"

Hope this helps!

Upvotes: 6

Related Questions