Dubas
Dubas

Reputation: 2876

DB2 insert UTF-8 characters on non unicode database with ALT_COLLATE UNICODE

I am trying to insert Chinese text in a DB2 Database but not working.

The database is configured by default as ANSI (en_US 819) (and it's a requirement for other applications that use the dame databse) ALT_COLLATE IDENTITY_16BIT is defined and UNICODE tables are created using CCSID UNICODE but unicode characters for Chinese or Korean are not inserted.

Example table:

CREATE TABLE LANGS (
    IDIOMA  char(2) NOT NULL,
    PAIS    char(2) NOT NULL,
    TRADUC  long varchar NOT NULL,
) CCSID UNICODE;

Example insert:

INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','其他');

System Information:

Example Java extract:

Class.forName("com.ibm.db2.jcc.DB2Driver");

...

Properties props = new Properties();
props.setProperty("user", user);
props.setProperty("password", pass);
props.setProperty("DB2CODEPAGE", "1208");
props.setProperty("retrieveMessagesFromServerOnGetMessage", "true");

con = DriverManager.getConnection(url, props);

...

Statement statement = con.createStatement();
statement.execute(sql);

...
statement.close();
con.close();

DB cfg get

DB2 Database locale configuration

Territorio de base de datos                             = en_US;
Página de códigos de base de datos                      = 819 
Conjunto de códigos de base de datos                    = iso8859-1 
Código de país/región de base de datos                  = 1 
Secuencia de clasificación de base de datos             = UNIQUE 
Orden de clasificación alternativo        (ALT_COLLATE) = IDENTITY_16BIT
Tamaño de página de base de datos                       = 4096

Statements are executed correctly and rows appears correctly in the database for:

but not for:

Insert from command line with db2cmd also does not work for this languages (Inserts but with only 1 byte.

Insert from command line in a Linux environment localized as zh_TW works. Insert from command line in a Linux environment localized as en_US.utf-8 works.

Never work on Java on these environments.


Using "X" as prefix form the VARCHAR field is not an option due some restrictions and the SQL works on two environments.

I think it may be some encoding problem on Client, or server due to configuration, file or sql encoding.


Update:

I tried also to load a UTF-8 file with the SQLs. the file loads correctly and debugging the SQL with UTF-8 characters is correctly passed to the Statement but the result is the same.

new InputStreamReader(new FileInputStream(file),"UTF-8")

...

private void executeLineByLine(Reader reader) throws SQLException {
    StringBuffer command = new StringBuffer();
    try {
        BufferedReader lineReader = new BufferedReader(reader);
        String line;
        while ((line = lineReader.readLine()) != null) {
            command = handleLine(command, line);
        }
        checkForMissingLineTerminator(command);
    } catch (Exception e) {
        String message = "Error executing: " + command + ".  Cause: " + e;
        printlnError(message);
        throw new SQLException(message, e);
    }
}


private StringBuffer handleLine(StringBuffer command, String line) throws SQLException, UnsupportedEncodingException {
    String trimmedLine = line.trim();
    if (lineIsComment(trimmedLine)) {
        println(trimmedLine);
    } else if (commandReadyToExecute(trimmedLine)) {
        command.append(line.substring(0, line.lastIndexOf(delimiter)));
        command.append(LINE_SEPARATOR);
        println(command);
        executeStatement(command.toString());
        command.setLength(0);
    } else if (trimmedLine.length() > 0) {
        command.append(line);
        command.append(LINE_SEPARATOR);
    }
    return command;
}

private void executeStatement(String command) throws SQLException, UnsupportedEncodingException {
    boolean hasResults = false;
    Statement statement = connection.createStatement();
    hasResults = statement.execute(command);
    printResults(statement, hasResults);
    statement.close();
}

Update2:

It's not possible to change the data types. The database is part of other systems and already with data.

The database is installed on 7 different servers on three of it that the data is inserted using Linux in a UTF-8 shell the data was inserted correctly from db2 command line.

From windows db2 command line or using Java it's not possible to insert the characters correctly.

Changing the Java sources to UTF-8 source makes the System.out prints the SQL correctly like i see debugging the sql variable.

When i insert this test SQL. It is shown correctly with chines characters in the System.out and in the Statement internal variable

INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','TEST1 其他 FIN TEST1');

But in the database the test appears as:

TEST3  FIN TEST3

HEX reprentation:

54 45 53 54 33 20 1A 1A 1A 1A 1A 1A 1A 1A 20 46 49 4E 20 54 45 53 54 33
T  E  S  T  3  _  ?  ?  ?  ?  ?  ?  ?  ?  _  F  I  N  _  T  E  S  T  3

I think that probably DB2 Java client is using allways Windows codepage (in this case is ISO-8859-1 or cp1252) instead of UTF-8 or the server is converting the data using the main collate instead the alternative collate of the table.

Update3:

I installed a Java SQL tool called DbVisualizer and using this tool on windows when a paste in the SQL panel the SQL and run it is inserted correctly in the databse.

This makes me to suspect that is not a problem of installation or data types. Probably are one of this three factors.

Upvotes: 1

Views: 14668

Answers (1)

Dubas
Dubas

Reputation: 2876

Problem is solved using these steps:

  1. Use always db2jcc4.jar not db2jcc.jar (JDBC 4)

    • (In some places JDBC level 2 was configured in the OS classpath with db2jcc instead DB2jcc4 )
  2. Set the environment variable DISABLEUNICODE=0

There is a complete information in this page Understanding DB2 Universal Database character conversion about unicode on DB2

Upvotes: 2

Related Questions