Shailesh Kumar
Shailesh Kumar

Reputation: 6967

encoding conversion from JIS X 208 to UNICODE

How can I convert a JIS X 208 encoded string into UNICODE in C++? A VC++ specific answer would be helpful.

The bigger problem that I am finding difficulty in understanding is that there are too many encodings for Japanese characters. JIS itself has many versions, then there is Shift-JIS. It would be great if some one could point towards a good explanation of these in English.

I looked through code page identifiers in MSDN. This does list Japanese (JIS 0208-1990 and 0121-1990) but I am wondering whats the difference between JIS 0208 and JIS X 0208.

Upvotes: 1

Views: 3664

Answers (4)

habe
habe

Reputation: 11

“JIS X 0208” is name of character set specification (i.e., it defines abstract shape of characters with character numbers). The spec. does not define how to encode (i.e., byte array representation of) the characters. (There're three major encodings for JIS X 0208; ISO-2022-JP, EUC-JP and Shift_JIS.)

So “JIS X 0208 encoded string” is ambiguous. If you mean “CP932 (which is most widely used variant of Shift_JIS) encoded string”, you may use MultiByteToUnicode() Win32API with CP932 as first argument.

JIS 0208 and JIS X 0208 may be same (latter is correct name of specification).

“0121-1990” in MSDN must be typo of “0212-1990”. It's also a character set specification which contains rarely-used (Kanji-)characters.

Upvotes: 1

Michael Madsen
Michael Madsen

Reputation: 55009

The X refers to the type of standard. All JIS standards have some classification, so "JIS 0208" is really just used as an abbreviation for "JIS X 0208".

Upvotes: 0

devio
devio

Reputation: 37215

JIS X 0208 seems to be outdated and superseded by JIS X 0213.

Shift JIS is an encoding of JIS X, i.e. an algorithm to convert 16-bit character codes into 8-bit representation.

I found this mapping table from JIS to Unicode and this C converter from JIS X 0208 to Unicode.

Hope this helps.

Upvotes: 1

Glen
Glen

Reputation: 22300

The ICU project contains many functions for converting from and to Unicode. It'll work on most OS's, including Windows. It'll handle conversions to/from pretty much all the codepages out there.

From what I can see, JIS X 0208 and JIS 0208 appear to be 2 variations in the name for the same thing, i.e. the actual codepage is the same.

Here's the wikipedia article on JIS 0208, hopefully it'll answer some of your questions as it goes into more depth into the history of JIS and it's different versions

Upvotes: 1

Related Questions