Reputation: 105
I have a script which runs on Oracle Databases (Windows, or Unix OS). It extracts data and then spools it to .txt files.
To ensure the file is unchanged, the data is hashed while running the script and this hash is later recalculated in a webapplication. This works 9/10 times, but sometimes it provides a mismatch, even though the files are identical and I isolated this to be an encoding issue.
In order to determine the encoding used for the files, the script writes 3 NONASCII characters to a file, which are encoded differently in different encoding schemes. These are later mapped on the backend.
--Encoding related information
SPOOL &&file_desc/Encoding.txt
SELECT ('€'||';'||'ƒ'||';'||'‰') FROM sys.dual;
SPOOL off
On a database with UTF-8 encoding, data with NONASCII chars should be spooled correctly and the 3 NONASCII characters should also be spooled correctly.
When using .AL32UTF8 system charset (same as DB), the data is spooled correctly, but the 3 characters for encoding are not. This makes it impossible for me to determine which encoding scheme was used.
Database has the following charactersets (obtained from database_properties):
NLS_CHARACTESET: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16
When using SQL-Developer (after setting the encoding to UTF8)), I have no issues. Both the Japanese and Greek characters show up correctly and the characters used for encoding correctly show up, leading to a successful hashing match when later recalculated.
I need it to work in SQL*Plus as well though and I’ve been running into issues. I’ve tried a range of different variations. DB is Oracle 18c express edition:
Set just char code page to utf-8 corresponding to DB chcp 65001 (code for utf-8) NLS_LANG charset: .WE8MSWIN1252 The file with the tablename including the japanese characters gives an encoding ‘error’: JAPANESE¿ The file with the 3 chars used to determine encoding does work well: €;ƒ;‰
No changes to code page, but updated NLS_LANG charset NLS_LANG charset: .AL32UTF8 The file with the tablename including the japanese character shows fine now: JAPANESE世 The file with the 3 chars used to determine encoding is now suddenly ‘empty’ however: ;;
Set NLS_LANG to same as DB and updated code pages chcp 65001 (code for utf-8) NLS_LANG charset: .AL32UTF8 The file with the tablename including the japanese character shows fine now: JAPANESE世 The file with the 3 chars used to determine encoding is now suddenly ‘empty’ however: ;;
Set NLS_LANG to NLS_NCHAR_Characterset. Just in case I also tried to set the system charset to AL16UTF16 which is equal to the NLS_NCHAR_Characterset, thinking that might help solve the issue, but then I get the following error: Error 19 initializing SQL*PLUS Invalid NLS character set for this OS environment
The code page doesn't really matter as it just concerns itself with outputting to the command window.
As could be expected, when using the WE8MSWIN1252 the spooling of NONASCII characters doesn't work, because it doesn't know the characters.
Yet when using AL32UTF8, the spooling of the 3 NONASCII characters form sys.dual no longer works, even though it does know the characters.
I've spent several days isolating this issue, I'm at a loss why the latter is the case, can anyone help me?
*Edit, kfinity offerd the solution, although not 100% certain yet why this is the case. So if anyone knows?
If we replace the select statement to the following, it does work:
select unistr('\20AC;\0192;\2030') from dual;
Upvotes: 5
Views: 5019
Reputation: 1362
The error Invalid UTF8 encoding
may occur if there are no results returned by SELECT
Upvotes: 0
Reputation: 9091
I'm not really sure what the issue is, but if you try doing:
select unistr('\20AC;\0192;\2030') from dual;
I think it would bypass the issue of getting SQL*Plus to read the correct character values from the .sql file.
Upvotes: 1
Reputation: 59436
sqlplus inherits the code page/encoding from parent cmd
window. NLS_LANG
tells the database which characterset/encoding is used by the client. So, when you run
chcp 65001
set NLS_LANG=.AL32UTF8
sqlplus ....
or
chcp 1252 (the default)
set NLS_LANG=.WE8MSWIN1252
sqlplus ....
then in principle you do it correctly. However, the windows cmd
does not fully support UTF-8, see How to use unicode characters in Windows command line? and/or https://community.oracle.com/tech/developers/discussion/600575/how-to-use-sqlplus-with-utf8-on-windows-command-line
You may have to use another tool than sqlplus to create your file.
See also OdbcConnection returning Chinese Characters as "?"
Upvotes: 3