patrickdavey
patrickdavey

Reputation: 2076

Convert a string (representing UTF-8 hex) to string

I have a string in UTF-8 hex like this:

s

I want to convert this into actual UTF-8 string. It should read:

Your credit has gone below 5 dollars. If you have an Add-On or Bonus your resources will work until exhausted. To top up now visit vodafone.co.nz/topup

This works:

s.scan(/.{4}/).map { |a| [a.hex].pack('U') }.join

but I'm wondering if there's a better way to do this: whether I should be using Encoding#convert.

Upvotes: 1

Views: 3170

Answers (3)

vol7ron
vol7ron

Reputation: 42109

If you are intending to use this on other oddly encoded strings, you could unpad the leading bytes:

[s.gsub(/..(..)/,'\1')].pack('H*')

Or use them:

s.gsub(/..../){|p|p.hex.chr}

If you want to use Encoding::Converter

ec = Encoding::Converter.new('UTF-16BE','UTF-8')     # save converter for reuse
ec.convert( [s].pack('H*') )                         # or:  ec.convert [s].pack'H*'

Upvotes: 1

matt
matt

Reputation: 79733

The extra 00s suggest that the string is actually the hex representation of a UTF-16 string, rather than UTF-8. Assuming that is the case the steps you need to carry out to get a UTF-8 string are first convert the string into the actual bytes the hex digits represents (Array#pack can be used for this), second mark it as being in the appropriate encoding with force_encoding (which looks like UTF-16BE) and finally use encode to convert it to UTF-8:

[s].pack('H*').force_encoding('utf-16be').encode('utf-8')

Upvotes: 5

Myst
Myst

Reputation: 19221

I think there are extra null characters all along the string (it's valid, but wasteful), but you can try:

[s].pack('H*').force_encoding('utf-8')

although, it seems "Your credit has gone below 5 dollars"...

The string prints with puts, but I can't read all the unicode characters on the terminal when the string is dumped.

Upvotes: 2

Related Questions