gruuuvy
gruuuvy

Reputation: 2129

How to decode Chinese hex string into Chinese characters or JavaScript?

I am working on a Rails app.

I am using an API that returns some Chinese provinces. The API returns the provinces in hex strings, for example:

{ "\xE5\x8C\x97\xE4\xBA\xAC" => "some data" }

My JavaScript calls a controller that returns this hash. I put all the province strings into a dropdown but the strings show up as a black diamond with a question mark in the middle. I am wondering how do I convert the Ruby hex string into actual Chinese characters, 北京? Or if possible, can I convert the hex string in JavaScript into Chinese characters?

Upvotes: 5

Views: 3093

Answers (2)

mu is too short
mu is too short

Reputation: 434685

The bytes \xE5\x8C\x97 are the UTF-8 representation of and \xE4\xBA\xAC is the UTF-8 representation of . So this string:

"\xE5\x8C\x97\xE4\xBA\xAC"

is 北京 if the bytes are interpreted as UTF-8. That you're seeing hex codes instead of Chinese characters suggests that the string's encoding is binary:

> s = "\xE5\x8C\x97\xE4\xBA\xAC"
 => "北京" 
> s.encoding
 => #<Encoding:UTF-8> 
> s.force_encoding('binary')
 => "\xE5\x8C\x97\xE4\xBA\xAC"

So this API you're talking to is speaking UTF-8 but somewhere your application is losing track of what encoding that string is supposed to be. If you force the encoding to be UTF-8 then the problem goes away:

> s.force_encoding('utf-8')
 => "北京" 

You should fix this encoding problem at the very edge of your application where it reads data from this remote API. Once that's done, everything should be sensible UTF-8 everywhere that you care about. This should fix your JavaScript problem as well as JavaScript is quite happy to work with UTF-8.

Upvotes: 4

scottxu
scottxu

Reputation: 933

I think you can do like this: doc rb:

 2.1.2 :002 > require 'uri'
     => true 
    2.1.2 :003 > URI.decode("\xE5\x8C\x97\xE4\xBA\xAC")
     => "北京" 

js: decodeURIComponent(URIstring)

Upvotes: 0

Related Questions