Reputation: 453328
Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?
Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS
and then do CHAR(255)
it returns я
.
If I try the following on a database with SQL_Latin1_General_CP1_CS_AS
collation however
SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS
It returns y
SELECT CHAR(255)
Returns ÿ
so it is obviously going first via the database's default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?
Upvotes: 6
Views: 19504
Reputation: 453328
There is actually a reasonably straight forward way of doing this
SELECT CONCAT(CAST(255 AS BINARY(1)), '' COLLATE SQL_Ukrainian_CP1251_CS_AS) /* я */
So based on this method a SQL Server 2022+ compatible script that returns the 255 values from 0x01
to 0xFF
for all 16 non utf8 code pages in use (DB Fiddle).
DECLARE @DynSQL NVARCHAR(MAX);
WITH DistinctCodePage AS
(
SELECT CodePage = CAST(COLLATIONPROPERTY(name, 'CodePage') AS INT),
ExampleCollation = MAX(name)
FROM sys.fn_helpcollations()
WHERE COLLATIONPROPERTY(name, 'CodePage') NOT IN (0 /*Exclude Unicode only collations*/, 65001 /*Exclude utf8 collations*/)
GROUP BY COLLATIONPROPERTY(name, 'CodePage')
)
SELECT @DynSQL = 'SELECT value,' +
STRING_AGG(CONVERT(NVARCHAR(MAX), quotename(CodePage) + ' = CONCAT(CAST(value AS binary(1)), SPACE(0) COLLATE ' + ExampleCollation + ')' ), ',') +
'
FROM generate_series(1,255)
ORDER BY value desc'
FROM DistinctCodePage;
EXEC (@DynSQL);
Upvotes: 8
Reputation: 3851
While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.
There are two potential methods I've seen to handle conversions, one is detailed here http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571 and involves bolting a custom conversion program onto the database and using that for conversions.
The other is to construct a db table consisting of
[CodePage], [ANSI Value], [UnicodeValue]
with the unicode value stored as either the int representing the unicode character to be converted using nchar()
or the nchar itself
Your using the collation SQL_Ukrainian_CP1251_CS_AS
which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
Its a TSV so after trimming the top off the raw data should import fairly cleanly.
Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.
Upvotes: 2