Guibao Wang
Guibao Wang

Reputation: 415

Storing utf-8 characters in Oracle with default NLS_CHARACTERSET setting

So for some legacy reasons we have an Oracle Database with NLS_CHARACTERSET = WE8MSWIN1252. Now we want to store CJK characters into a VARCHAR2 field. Is this possible, and if so, what do I have to do to implement this?

PS: as the product has been released, changing the NLS_CHARACTERSET will be out of the question.

EDIT

So far we have come up with an idea like this: For each CJK character we break it into its UTF-8 byte representations, then store the byte sequence into database. In return, we reorganize them into a CJK character. For example, the Chinese character would be 0xe4, 0xb8, 0xad, so we store the 3 bytes instead.

However this approach seems not work correctly. If we store Chinese character which is 0xe5, 0x8d, 0x8e in byte, it becomes 0xe5, 0xbf, 0x8e in database.

We are using Java language, have no idea if this has any to do with the result.

Upvotes: 0

Views: 2546

Answers (1)

Justin Cave
Justin Cave

Reputation: 231651

Not correctly, no. If the character set of the database is Windows-1252, the only characters you can properly store in a VARCHAR2 column are those that exist in the Windows-1252 character set.

If the NLS_NCHAR_CHARACTERSET is Unicode (generally AL16UTF16), you could create an NVARCHAR2 column and store CJK characters in that new column. Your application may need code changes in order to support NVARCHAR2 columns.

Upvotes: 1

Related Questions