Daniel Cukier
Daniel Cukier

Reputation: 11942

Convert an escaped unicode String to its chars in ruby 1.8

I have to read some text files with the following content:

\u201CThe Pedlar Lady of Gushing Cross\u201D

In ruby 1.9 terminal, when I create a string with this content:

ruby-1.9.1-p378 > "\u2714 \u2714 my great string \u2714 \u2714"
 => "✔ ✔ my great string ✔ ✔" 

In ruby 1.8, I don't get the unicode codes converted to their characters:

ree-1.8.7-2010.01 > "\u2714 \u2714 my great string \u2714 \u2714"
 => "u2714 u2714 my great string u2714 u2714" 

Is there any easy way to return the right string chars in Ruby 1.8?

Upvotes: 6

Views: 3040

Answers (3)

Pieter Müller
Pieter Müller

Reputation: 4693

This builds on @Dave's answer. I'm using the following to replace all Unicode escape sequences in a given string with the corresponding character:

string_value.gsub(/\\u([0-9a-fA-F]{4})/) {|m| [$1.hex].pack("U")}

It's a regular expression that looks for "\u" followed by 4 hexadecimal symbols. It then throws away the "\u", converts the 4 hex symbols to an integer and uses pack to get the Unicode character. It replaces each escape sequence with the corresponding character and returns the resulting string.

It will give you trouble if your string is escaped further (e.g. by having "\" escaped as "\\"). But in the vanilla case it should work fine.

Upvotes: 2

Dave
Dave

Reputation: 4694

For anyone else who stumbles on this question (like me) looking for an answer, the equivalent way of doing this in Ruby 1.8 would be:

["2714".to_i(16)].pack("U*")

Upvotes: 6

Martin v. Löwis
Martin v. Löwis

Reputation: 127447

The simplest approach might be to use a JSON parser, as JSON happens to use this very format:

irb(main):014:0> JSON '["\u2714 \u2714 my great string \u2714 \u2714"]'
=> ["\342\234\224 \342\234\224 my great string \342\234\224 \342\234\224"]

Upvotes: 5

Related Questions