Jeremy Smith
Jeremy Smith

Reputation: 15069

Clean up & style characters from text

I am getting text from a feed that has alot of characters like:

Insignia™ 2.0 Stereo Computer Speaker System (2-Piece) - Black
4th-Generation Apple® iPod® touch

Is there an easy way to get rid of these, or do I have to anticipate which characters I want to delete and use the delete method to remove them? Also, when I try to remove

&

with

str.delete("&")

It leaves behind "amp;" Is there a better way to delete this type of character? Do I need to re-encode the text?

Upvotes: 8

Views: 9029

Answers (3)

Phrogz
Phrogz

Reputation: 303225

If you are getting data from a 'feed', aka RSS XML, then you should be using an XML parser like Nokogiri to process the XML. This will automatically unescape HTML entities and allow you to get the proper string representation directly.

Upvotes: 1

Mark Thomas
Mark Thomas

Reputation: 37517

String#delete is certainly not what you want, as it works on characters, not the string as a whole.

Try

str.gsub /&/, ""

You may also want to try replacing the & with a literal ampersand, such as:

str.gsub /&/, "&"

If this is closer to what you really want, you may get the best results unescaping the HTML string. If so try this:

CGI::unescapeHTML(str)

Details of the unescapeHTML method are here.

Upvotes: 24

WarHog
WarHog

Reputation: 8710

For removing try to use gsub method, something like this:

text = "foo&bar"
text.gsub /\b&\b/, ""  #=> foobar

Upvotes: -1

Related Questions