JDBC with Non-Unicode database, how to specify handling of unsupported characters?

Question

I have a Java application that works with Unicode and a database (Oracle, MSSQL, DB2, MySQL) that is in an 8-bit non-Unicode codepage (for example IBM1141). Migrating database to Unicode is not an option.

Is there any way to specify the behavior (replace/error/warn) of the JDBC driver, when the application passes a unicode character, which cannot be encoded in the database encoding?

Laurenz Albe · Accepted Answer

The JDBC specification has nothing to say on the topic of encoding, so it is up to the implementation to handle this.

Since Java itself uses UTF-16 internally, every JDBC driver that is worth its salt will automatically convert between the database encoding and UTF-16.

The behaviour of a JDBC driver when it encounters characters that it cannot convert is implementation specific and will depend on the “philosophy” of the database system.

The two JDBC drivers I know well behave differently:

Oracle JDBC will silently replace characters that cannot be converted with a replacement character. There is no way to get the Oracle JDBC driver or the Oracle database to throw an error.
PostgreSQL JDBC will always report an error if a character cannot be converted. There is no way to get PostgreSQL to silently modify the character or store an invalid character.

This is normally not an issue when you read data from the database, because everything can be converted to UTF-16, but it will be a problem when writing to the database. You'll have to sanitize the data yourself before writing them to the database.

JDBC with Non-Unicode database, how to specify handling of unsupported characters?

Answers (1)

Related Questions