Reputation: 6352
I have text in a database which stored as: \xE2\x82\xAC 50
Important note: symbols stored in a database not as UTF bytes, but as symbols: "slash", "letter x", "letter E" etc.. so Ruby representation will be "\\xE2\\x82\\xAC 50"
(double slashes, not single).
How do I convert this string to € 50
:
> xx = "\\xE2\\x82\\xAC"
"\\xE2\\x82\\xAC"
> xx.bytes
[92, 120, 69, 50, 92, 120, 56, 50, 92, 120, 65, 67]
This does not work:
xx.force_encoding('utf-8')
xx.encode('utf-8')
xx.force_encoding('binary').force_encoding('utf-8')
xx.encode('ASCII-8BIT').encode('utf-8')
Upvotes: 0
Views: 1103
Reputation: 6352
For now came only with an ugly "converter"
def fix_utf_symbols(str)
match = str.scan /(\\x[0-9A-F]{2})/
match.flatten.each do |ascii_code|
utf_char = ascii_code[2..3].hex.chr
str.gsub! ascii_code, utf_char
end
str
end
More elegant solutions are welcomed
Upvotes: 1
Reputation: 160601
It's not a "in Ruby" thing, it's about understanding what you're seeing and how strings and escaped characters work.
Meditate on this:
"\\xE2\\x82\\xAC" # => "\\xE2\\x82\\xAC"
'\xE2\x82\xAC' # => "\\xE2\\x82\\xAC"
"\xE2\x82\xAC" # => "€"
The third way is how to define the bytes that create the Euro symbol character. The first two are two different ways of writing the string with literal backslashes.
If you've stored the data correctly in the database it'll be retrieved correctly. The DB driver you're using is responsible for converting to the string used by the language, so it should be transparent to you once you've retrieved the fields.
Current Rubies use UTF-8 by default, so it's not necessary to try to force the string to UTF-8, simply define it correctly.
Dealing with character escaping in strings will be confusing until you learn the special cases and how single-quoted strings behave differently than double-quoted. You can find more information about escaping by reading Wikipedia's "Escape character" article. The information applies to almost every language out there, not just Ruby.
Upvotes: 1