DeadlyJesus
DeadlyJesus

Reputation: 1533

UTL_FILE and character set

I've been working on this thing for days and it's driving me crazy.
I have an oracle procedure that write a file using UTL_FILE. I used to store my values as NVARCHAR2 and write my file using UTL_FILE.PUT_LINE_NCHAR procedure, and it wrote file in (what notepad++ consider as) UTF8.
The file is then used by another program, the problem is that said program read it using WE8MSWIN1252, and I can't change that, since it's legacy code.
So I tried to use UTL_FILE.PUT_LINE procedure instead, but the file was still considered as UTF8. I saw in oracle's documentation that NVARCHAR2 used the national character set (mine is AL16UTF16), so I tried to use the CONVERT method like this:

CONVERT(whatIWantToWrite, 'WE8MSWIN1252', 'AL16UTF16'))

and it raised the ORA-29298 Character set mismatch Exception. I don't get it, my NLS_NCHAR_CHARACTERSET is AL16UTF16 why can't I convert it to WE8MSWIN1252 ?
Is there another way to write a file using WE8MSWIN1252 ?

Upvotes: 4

Views: 35983

Answers (3)

Sylwek
Sylwek

Reputation: 1

You can use dbms_xslprocessor.clob2file.

declare
  l_str varchar2(20);
BEGIN
  select str into l_str from t42;
  dbms_xslprocessor.clob2file(to_clob(l_str), 'UTLDIR', 'file.txt', 2000);
END;

AL16UTF16(csid)=2000 WE8MSWIN1252(csid)=178 To get CSID:

SELECT NLS_CHARSET_ID('WE8MSWIN1252') FROM DUAL; 

Upvotes: 0

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59436

Maybe it is an option for you to convert the file afterwards it has been written to disc, e.g. with Java tool Native-to-ASCII Converter.

native2ascii -encoding UTF8 my_text_file_utf.txt my_text_file.tmp
native2ascii -reverse -encoding windows-1252 my_text_file.tmp my_text_file_1252.txt

Upvotes: 0

Alex Poole
Alex Poole

Reputation: 191235

This seems to be because you're still opening the file with fopen_nchar. If I do this:

create table t42(str nvarchar2(20));
insert into t42 values ('Hello');

declare
  file utl_file.file_type;
  l_str nvarchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file, convert(l_str, 'WE8MSWIN1252', 'AL16UTF16'));
  utl_file.fclose(file);
end;
/

... then I get a file containing 䡥汬, which the Linux file command reports as UTF-8 Unicode text; Notepad++ shows 䡥汬 and says the file is 'ANSI as UTF-8'.

If I change the fopen to fopen_nchar:

  file := utl_file.fopen_nchar('CENSYS_EXPORT_DIR', 'dummy.dat', 'w', 32767);

... then I get ORA-29298: Character set mismatch and an empty file.

If I go back to fopen but change the PL/SQL variable to varchar2:

declare
  file utl_file.file_type;
  l_str varchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file, convert(l_str, 'WE8MSWIN1252', 'AL16UTF16'));
  utl_file.fclose(file);
end;
/

... then the file contains ¿¿ (in vim) and the file is reported as ISO-8859 text. But Notepad++ shows ߿ and says the file is ANSI.

Rather than using convert, which Oracle discourages, you can bounce it through raw:

declare
  file utl_file.file_type;
  l_str varchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file,
    utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw(l_str),
      'ENGLISH_UNITED KINGDOM.WE8MSWIN1252', 'ENGLISH_UNITED KINGDOM.UTF8')));
  utl_file.fclose(file);
end;
/

In Linux that shows as Hello and the file is reported as ASCII text; Notepad++ shows it as Hello as well, and again says the file is ANSI. I'm unclear if that gets you where you need to be... and you might need a different language and locale, of course.

But my database character set is AL32UTF8, and my national character set is AL16UTF16, so you might see different behaviour; if your database character set is WE8MSWIN1252 then the file will be created as that as well; from the documentation:

UTL_FILE expects that files opened by UTL_FILE.FOPEN in text mode are encoded in the database character set. It expects that files opened by UTL_FILE.FOPEN_NCHAR in text mode are encoded in the UTF8 character set.

Upvotes: 6

Related Questions