SQL Server Code Pages and Collations

Question

Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?

Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS and then do CHAR(255) it returns я.

If I try the following on a database with SQL_Latin1_General_CP1_CS_AS collation however

SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS

It returns y

SELECT CHAR(255)

Returns ÿ so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?

Robb · Accepted Answer

While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.

There are two potential methods I've seen to handle conversions, one is detailed here http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571 and involves bolting a custom conversion program onto the database and using that for conversions.

The other is to construct a db table consisting of

[CodePage], [ANSI Value], [UnicodeValue]

with the unicode value stored as either the int representing the unicode character to be converted using nchar()or the nchar itself

Your using the collation SQL_Ukrainian_CP1251_CS_AS which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

Its a TSV so after trimming the top off the raw data should import fairly cleanly.

Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.

SQL Server Code Pages and Collations

Answers (2)

Related Questions