Liviu Răican
Liviu Răican

Reputation: 11

AS3: Conversion to GBK charset

Using Flex (and HTTPService), I am loading data from an URL, data that is encoded with the GBK charset. A good example of such an URL is this one.

A browser gets that the data is in the GBK charset, and correctly displays the text using Chinese characters where they appear. However, Flex will hold the data in a different charset, and it happens to look like this:

({"q":"tes","p":false,"bs":"","s":["ÌØ˹À­","ÌØÊâ·ûºÅ","test","ÌØÊâÉí·Ý","tesco","ÌØ˹À­Æû³µ","ÌØÊÓÍø","ÌØÊâ·ûºÅͼ°¸´óȫ","testin","ÌØ˹À­Æ󳵼۸ñ"]});

I need to correctly change the text to the same character string that the browsers display. What I am already doing is using ByteArray, with the best result so far by using "iso-8859-1":

var convert:String; 
var byte:ByteArray = new ByteArray(); 
byte.writeMultiByte(event.result as String, "iso-8859-1");
byte.position = 0;
convert = byte.readMultiByte(byte.bytesAvailable, "gbk");

This creates the following string, which is very close to the browser result but not entirely:

({"q":"tes","p":false,"bs":"","s":["特?拉","特殊符号","test","特殊身份","tesco","特?拉汽车","特视网","特殊符号?案大?","testin","特?拉????]});

Some characters are still replaced by "?" marks. And when I copy the browser result into Flex and print it, it gets displayed correctly so it is not a matter of unsupported characters in Flash trace or anything like that.

Interesting fact: Notepad++ gives the same close-but-not-quite result as the bytearray approach in Flex. Also in NP++, when converting the correct/expected string, from gbk to iso-8859-1, I am getting a slightly different string than the one Flex is getting from the URL:

({"q":"tes","p":false,"bs":"","s":["ÌØ˹À­","ÌØÊâ·ûºÅ","test","ÌØÊâÉí·Ý","tesco","ÌØ˹À­Æû³µ","ÌØÊÓÍø","ÌØÊâ·ûºÅͼ°¸´óÈ«","testin","ÌØ˹À­Æû³µ¼Û¸ñ"]});

Seems to me that this string is the one that Flex should be getting, to have the ByteArray approach create the correct result (visible in browsers). So I see possible 3 causes for this:

  1. Something is happening to the data coming from the URL to Flex, causing it to be slightly different (unlikely)
  2. The received charset is not actually iso-8859-1, but another similar charset
  3. I don't have a complete grasp of the difference between encoding and charset, so maybe this keeps me from understanding the problem.

Any help/idea would be greatly appreciated. Thank you.

Upvotes: 0

Views: 280

Answers (1)

Liviu Răican
Liviu Răican

Reputation: 11

Managed to find the problem and solution, hope this will help anyone else in the future.

Turns out using HTTPService automatically converts the result into a String, which may compress some pair of bytes into single characters. That is why I was getting the first result (see up) instead of the third one. What I needed to do is get the result in binary form, and HTTPService does not have this type of resultFormat; however URLLoader does.

  1. Replace HTTPService with URLLoader
  2. Set the dataFormat property of the URLLoader to URLLoaderDataFormat.BINARY
  3. After loading, the data property will return as a ByteArray. Tracing this byte array (or converting it into a String) will display the same result as the HTTPService is getting, which is still wrong, however in reality the byte array actually holds the correct data byte for byte (the length property of the byte array will be a bit larger than the size of the converted string).
  4. So you can read the string from this bytearray, using the "gbk" charset:

    byteArray.readMultyByte(byteArray.length, "gbk");

This returns the correct string, which the browser is also displaying.

Upvotes: 1

Related Questions