user1969191
user1969191

Reputation: 282

Data Conversion in Ruby

I am having issue while doing these conversions:

string = "test \\ud83d\\ude01" #into '1f601' and vise versa.

unicode_value = 'U+1F601' #into string '\\ud83d\\ude01'

I have tried this method to encode

string.encode('utf-8') #output is "test \\ud83d\\ude01"

Also tried this one

string.force_encoding('utf-8')  #output is "test \\ud83d\\ude01"

Thanks

Upvotes: 3

Views: 229

Answers (1)

Eric Duminil
Eric Duminil

Reputation: 54223

Hex to Unicode Char

"\ud83d\ude01" to smiley

According to this table, "\\ud83d\\ude01" looks like UTF-16 (hex). Note that it is a standard ASCII String : ["\\", "u", "d", "8", "3", "d", "\\", "u", "d", "e", "0", "1"]

str = "\\ud83d\\ude01"
hex = str.gsub("\\u",'')

smiley = [hex].pack('H*').force_encoding('utf-16be').encode('utf-8')
puts smiley
#=> 😁

'U+1F601' to smiley

This looks like a 'UTF-8' character in hexadecimal. Note that "U+1F601" also is a standard ASCII string : ["U", "+", "1", "F", "6", "0", "1"]

unicode_value = 'U+1F601'
hex = unicode_value.sub('U+','')
smiley = hex.to_i(16).chr('UTF-8')
puts smiley
#=> 😁

UTF-8 Hex ⟷ UTF-16 Hex

Combining both methods above :

"\ud83d\ude01" to 'U+1F601'

str = "\\ud83d\\ude01"
utf16_hex = str.gsub("\\u",'')
smiley = [utf16_hex].pack('H*').force_encoding('utf-16be').encode('utf-8')
utf8_hex = smiley.ord.to_s(16).upcase
new_str = "U+#{utf8_hex}"
puts new_str
#=> "U+1F601"

'U+1F601' to "\ud83d\ude01"

unicode_value = 'U+1F601'
hex = unicode_value.sub('U+','')
smiley = hex.to_i(16).chr('UTF-8')
puts smiley.force_encoding('utf-8').encode('utf-16be').unpack('H*').first.gsub(/(....)/,'\u\1')
#=> "\ud83d\ude01"

There might be an easier way to do this, but I couldn't find it.

Using this code

def utf16_hex_to_unicode_char(utf16_hex)
  hex = utf16_hex.gsub("\\u",'')
  [hex].pack('H*').force_encoding('utf-16be').encode('utf-8')
end

def replace_all_utf16_hex(string)
  string.gsub(/(\\u[0-9a-fA-F]{4}){2}/){|hex| utf16_hex_to_unicode_char(hex)}
end

puts replace_all_utf16_hex("Hello \\ud83d\\ude01, I just bought a \\uD83D\\uDC39")
#=> "Hello 😁, I just bought a 🐹"

Upvotes: 5

Related Questions