Newl
Newl

Reputation: 330

How to fix character encoding in sql query

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.

The documentation says that the encoding is utf-8 latin1. My query looks something like this:

SELECT firstn, lastn
FROM unams
WHERE unamid = 12345

The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.

Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.

EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016

Upvotes: 1

Views: 6894

Answers (2)

mao
mao

Reputation: 12267

Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.

If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.

If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.

Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.

Upvotes: 2

Mark Barinstein
Mark Barinstein

Reputation: 12299

with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from 
  t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>' 
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);

C C_HEX
- -----
é C3A9
ó C3B3

The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.

If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

Upvotes: 0

Related Questions