Daniel Garmoshka
Daniel Garmoshka

Reputation: 6352

How to decode backslashed special symbols to UTF-8

I have text in a database which stored as: \xE2\x82\xAC 50

Important note: symbols stored in a database not as UTF bytes, but as symbols: "slash", "letter x", "letter E" etc.. so Ruby representation will be "\\xE2\\x82\\xAC 50" (double slashes, not single).

How do I convert this string to € 50:

> xx = "\\xE2\\x82\\xAC"
"\\xE2\\x82\\xAC"
> xx.bytes
[92, 120, 69, 50, 92, 120, 56, 50, 92, 120, 65, 67]

This does not work:

xx.force_encoding('utf-8')
xx.encode('utf-8')
xx.force_encoding('binary').force_encoding('utf-8')
xx.encode('ASCII-8BIT').encode('utf-8')

Upvotes: 0

Views: 1103

Answers (2)

Daniel Garmoshka
Daniel Garmoshka

Reputation: 6352

For now came only with an ugly "converter"

  def fix_utf_symbols(str)
    match = str.scan /(\\x[0-9A-F]{2})/
    match.flatten.each do |ascii_code|
      utf_char = ascii_code[2..3].hex.chr
      str.gsub! ascii_code, utf_char
    end
    str
  end

More elegant solutions are welcomed

Upvotes: 1

the Tin Man
the Tin Man

Reputation: 160601

It's not a "in Ruby" thing, it's about understanding what you're seeing and how strings and escaped characters work.

Meditate on this:

"\\xE2\\x82\\xAC" # => "\\xE2\\x82\\xAC"
'\xE2\x82\xAC' # => "\\xE2\\x82\\xAC"

"\xE2\x82\xAC" # => "€"

The third way is how to define the bytes that create the Euro symbol character. The first two are two different ways of writing the string with literal backslashes.

If you've stored the data correctly in the database it'll be retrieved correctly. The DB driver you're using is responsible for converting to the string used by the language, so it should be transparent to you once you've retrieved the fields.

Current Rubies use UTF-8 by default, so it's not necessary to try to force the string to UTF-8, simply define it correctly.

Dealing with character escaping in strings will be confusing until you learn the special cases and how single-quoted strings behave differently than double-quoted. You can find more information about escaping by reading Wikipedia's "Escape character" article. The information applies to almost every language out there, not just Ruby.

Upvotes: 1

Related Questions