Reputation: 9740
It seems that SQL Server 2008 removes some unicode characters when comparing two strings. Consider the following table:
CREATE TABLE [dbo].[Test](
[text] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[text] ASC
))
And now if I insert some rows with unicode characters:
insert into Test values(N'it᧠')
insert into Test values(N'it')
I get a unique constraint exception, even though the values are different. I'm using the default database collation here, which is SQL_Latin1_General_CP1_CI_AS.
Violation of PRIMARY KEY constraint 'PK_Test'. Cannot insert duplicate key in object 'dbo.Test'.
Note, that this doesn't happen for all unicode characters, but only for some characters, but I haven't been able to identify which unicode ranges exactly are problematic. For instance, the dingbat 0x2757 (❗) is removed in comparison, but 0x2764 (♥) is not. I guess it has something to do with 0x2757 being from a newer unicode standard.
So the question is, is there any way to make SQL Server 2008 work with these characters, or alternatively, can I programatically detect them (in C#, via unicode ranges or some such) and remove them beforehand?
Upvotes: 1
Views: 1186
Reputation: 69749
Okay, so a bit more digging shows this is almost certainly due to newer character, since this also works with the sql server 2008 equivalents of latin collation, but not the older versions, i.e. works with Latin1_General_100_CI_AS
, but not with Latin1_General_CI_AS
. To get a full list of the collations that correctly compare these strings I used:
IF OBJECT_ID('Tempdb..#T') IS NOT NULL
DROP TABLE #T;
IF OBJECT_ID('Tempdb..#V') IS NOT NULL
DROP TABLE #V;
CREATE TABLE #V (A NVARCHAR(50), B NVARCHAR(50));
INSERT #V (A, B) VALUES (N'it᧠', N'it');
CREATE TABLE #T (Collation VARCHAR(500), Match BIT);
DECLARE @SQL NVARCHAR(MAX) = (SELECT N'INSERT #T (Collation, Match)
SELECT ''' + Name + ''', CASE WHEN A = B COLLATE ' + name + ' THEN 1 ELSE 0 END
FROM #V;'
FROM sys.fn_helpcollations()
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)');
EXECUTE sp_executesql @SQL;
SELECT *
FROM #T
WHERE Match = 0;
Upvotes: 1