YerivanLazerev
YerivanLazerev

Reputation: 365

Issues with latin chars encoding in Oracle table

I have an oracle database with

NLS_CHARACTERSET=AL32UTF8  

sqlplus nls_lang is WE8ISO8859P1

unix shell is iso aswell

There is varchar2 column with letter: LATIN SMALL LETTER O WITH ACUTE

When I call DUMP function on this column I get:

Typ=1 Len=4: 195,131,194,179

This doesn't look like UTF8.

Running SELECT shows corrupted cha on screen.

Running CONVERT(VALUE, 'WE8ISO8859P1', 'UTF8') - shows correct value, but the encoding doesn't look like UTF8...


If I try to insert same letter manually, the select query returns correct char.

The Dump shows 195,179 which is correct UTF8 code.

Running CONVERT(VALUE, 'WE8ISO8859P1', 'UTF8') - fails
ORA-29275: partial multibyte character

Which is strange because 195,179 is indeed UTF8.

Please clarify the above to me.

Upvotes: 0

Views: 2297

Answers (1)

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59436

Value Typ=1 Len=4 CharacterSet=AL32UTF8: c3,83,c2,b3 was obviously inserted wrongly.

It happened when the client sent the characters as UTF-8 but the database treated this data as WE8ISO8859P1.

If the client sends data as UTF-8 then you must also set NLS_LANG=.AL32UTF8 (language and territory are optional and not relevant here).

However when you run sqlplus then sqlplus inherits character set from command line which seems to be ISO 8859-1. So before you start sqlplus you must run export NLS_LANG=.WE8ISO8859P1 in order to set it properly.

I create a table with your data:

SELECT 
    val, 
    DUMP(val, 1016) as dump_val, 
    DUMP(CONVERT(VAL, 'WE8ISO8859P1'), 1016) as conv_dump
FROM AA;


+-------------------------------------------------------------------------------------------+
|VAL|DUMP_VAL                                      |CONV_DUMP                               |
+-------------------------------------------------------------------------------------------+
|ó |Typ=1 Len=4 CharacterSet=AL32UTF8: c3,83,c2,b3|Typ=1 Len=2 CharacterSet=AL32UTF8: c3,b3|
|ó  |Typ=1 Len=2 CharacterSet=AL32UTF8: c3,b3      |Typ=1 Len=1 CharacterSet=AL32UTF8: f3   |
+-------------------------------------------------------------------------------------------+

Row 1 shows corrupt data, row 2 is fine.

CONVERT(VAL, 'WE8ISO8859P1') is used to convert corrupt data to correct character set, so conversion of CONVERT(UTL_RAW.CAST_TO_VARCHAR2('C383C2B3'), 'WE8ISO8859P1') is fine.

However CONVERT(UTL_RAW.CAST_TO_VARCHAR2('C3B3'), 'WE8ISO8859P1') would return hex F3 which is not a valid UTF-8 byte sequence, thus you get an ORA-29275.

Upvotes: 1

Related Questions