Reputation: 54212
I have a piece of "monster character" sentence (someone sent it to me).
æ��該å��è¬�: å�¨å®¶è£¡æ�¯ä¸�å�¯ä»¥è¬�æ°�主ç��ã��æ��以, æ��æ��ç�¶å�ºç�¾ç³¾ç´�, ä¸�ä½�ä¸�å�¯ä»¥ç��é �, é��å�¯è�½æ��ç¯�å¤�ç��æ��... ä½�大ç��, å�¯æ��è®�ä¸�è®�...
Is there any way to decode it back to normal character?
Upvotes: 0
Views: 857
Reputation: 37898
It is, theoretically, possible.
You can reverse various encodings. There's a tool for doing this with Russian here, for instance.
Of course it would be much better to do this automatically; this can be done because something similar is done by programs like Microsoft Word when it opens a file. If you try to open binary files with Word, you'll see that it sometimes prompts you to choose an encoding, because it couldn't find one, and shows a list of the most likely ones.
I presume the way this is done is checking statistics about character occurrances. For example, in English "e" and "t" happen much more often than "q" and "j". This is long known; Morse code uses only one dot and one dash for "e" and "t" and four dots and dashes for "q" and "j" for this reason.
So an hypotetical tool that does this would probably try a lot of encoding combinations (a lot!) and check which one looks most like a real language.
Other heuristics could be a dictionary for each language, but this starts to become a very intensive process.
Upvotes: 1
Reputation: 54212
This answer is not really a solution, but there are some software in the Internet provides encoding fixing functions that can do the job.
One of them is a Chinese software ( http://www.cpatch.org/thread-12818-1-1.html ). I put the link here, in case someone is looking for it.
I tried to use PHP function mb_detect_encoding
and iconv
, but none of them can convert the string successfully. The data may be permanently lost, due to incomplete copy and paste.
Upvotes: 0