Reputation: 365
I have an oracle database with
NLS_CHARACTERSET=AL32UTF8
sqlplus nls_lang is WE8ISO8859P1
unix shell is iso aswell
There is varchar2 column with letter: LATIN SMALL LETTER O WITH ACUTE
When I call DUMP function on this column I get:
Typ=1 Len=4: 195,131,194,179
This doesn't look like UTF8.
Running SELECT shows corrupted cha on screen.
Running CONVERT(VALUE, 'WE8ISO8859P1', 'UTF8')
- shows correct value, but the encoding doesn't look like UTF8...
If I try to insert same letter manually, the select query returns correct char.
The Dump shows 195,179 which is correct UTF8 code.
Running CONVERT(VALUE, 'WE8ISO8859P1', 'UTF8')
- fails
ORA-29275: partial multibyte character
Which is strange because 195,179 is indeed UTF8.
Please clarify the above to me.
Upvotes: 0
Views: 2297
Reputation: 59436
Value Typ=1 Len=4 CharacterSet=AL32UTF8: c3,83,c2,b3
was obviously inserted wrongly.
It happened when the client sent the characters as UTF-8 but the database treated this data as WE8ISO8859P1.
If the client sends data as UTF-8 then you must also set NLS_LANG=.AL32UTF8
(language and territory are optional and not relevant here).
However when you run sqlplus then sqlplus inherits character set from command line which seems to be ISO 8859-1. So before you start sqlplus you must run export NLS_LANG=.WE8ISO8859P1
in order to set it properly.
I create a table with your data:
SELECT
val,
DUMP(val, 1016) as dump_val,
DUMP(CONVERT(VAL, 'WE8ISO8859P1'), 1016) as conv_dump
FROM AA;
+-------------------------------------------------------------------------------------------+
|VAL|DUMP_VAL |CONV_DUMP |
+-------------------------------------------------------------------------------------------+
|ó |Typ=1 Len=4 CharacterSet=AL32UTF8: c3,83,c2,b3|Typ=1 Len=2 CharacterSet=AL32UTF8: c3,b3|
|ó |Typ=1 Len=2 CharacterSet=AL32UTF8: c3,b3 |Typ=1 Len=1 CharacterSet=AL32UTF8: f3 |
+-------------------------------------------------------------------------------------------+
Row 1 shows corrupt data, row 2 is fine.
CONVERT(VAL, 'WE8ISO8859P1')
is used to convert corrupt data to correct character set, so conversion of CONVERT(UTL_RAW.CAST_TO_VARCHAR2('C383C2B3'), 'WE8ISO8859P1')
is fine.
However CONVERT(UTL_RAW.CAST_TO_VARCHAR2('C3B3'), 'WE8ISO8859P1')
would return hex F3
which is not a valid UTF-8 byte sequence, thus you get an ORA-29275.
Upvotes: 1