Maciek Sawicki
Maciek Sawicki

Reputation: 6835

Removing XML entities from string in Ruby

I try to parse RSS chaanal with simple-rss lib.

Unfortunately I got a lot of garbage in node:

 <description>&lt;p&gt;
some decryption

&lt;/p&gt;
 &lt;a href="http://url.com/trac/xxx/wiki/foo?action=diff&amp;amp;version=28"&gt;(diff)&lt;/a&gt;</description>

I need to retrieve text ("some description") and optionally url.

What is the best way to do it? Regexp (if this is answer could You give me example, please?)?

Upvotes: 0

Views: 742

Answers (1)

Chirantan
Chirantan

Reputation: 15634

Thats not garbage. It is just HTML sanitized string of characters. And I am assuming by the url, you mean with the html tags(<a></a>). Following code should work.

require 'cgi'
description = "&lt;/p&gt; &lt;a href=\"http://url.com/trac/xxx/wiki/foo?action=diff&amp;amp;version=28\"&gt;(diff)&lt;/a&gt;"
CGI.unescapeHTML(description) # => </p> <a href="http://url.com/trac/xxx/wiki/foo?action=diff&amp;version=28">(diff)</a>

If you don't want the html tags, there are various ways to just obtain the url. A simple regex for the url should work, which I leave it to you to figure out.(Hint - Google)

Upvotes: 3

Related Questions