cokeman19
cokeman19

Reputation: 2465

Narrow No-Break Space Comparison Failing in SQL Server

In at least SQL Server 2012 and 2016, Narrow No-Break Space characters (NNBSP, Unicode 8239) are ignored in equality comparisons. For example, take the following two queries, which compare the strings “AB” and “A   B”. The first has "normal", ASCII character 32 spaces; the second uses NNBSP.

-- Do not equal, as expected
SELECT *
  FROM (SELECT N'AB' NoSpaces) NoSpaces,
       (SELECT N'A   B' NormalSpaces) NormalSpaces
WHERE NoSpaces.NoSpaces = NormalSpaces.NormalSpaces

-- Do equal, which is unexpected
SELECT *
  FROM (SELECT N'AB' NoSpaces) NoSpaces,
       (SELECT N'A' + NCHAR(8239) + NCHAR(8239) + NCHAR(8239) + N'B' NNBSpaces) NNBSpaces
WHERE NoSpaces.NoSpaces = NNBSpaces.NNBSpaces   

Some other string functions ignore the NNBSP, such as LIKE and CHARINDEX. LIKE has the same result as equals and CHARINDEX returns 0, indicating that the character was not found.

However, other functions like LEN and SUBSTRING, do acknowledge their existence.

In researching each of the above functions, I found that the current collation can be a factor in string comparison. My current collation is SQL_Latin1_General_CP1_CI_AS, however I have tried all of the "SQL_Latin1_General" collations, all of which have the same result.

Can anyone offer any insight as to why this occurs?

Upvotes: 0

Views: 862

Answers (1)

cokeman19
cokeman19

Reputation: 2465

This occurs because the default SQL Server collation for US English, SQL_Latin1_General_CP1_CI_AS, does not correctly handle the NNBSP Unicode character comparison.

As of SQL Server 2008 and newer, there are newer "Latin1" collations available, which include "_100" in their name. Using any of these newer collations, the issue I observed does not exist.

According to MSDN (which I found via this answer), these newer collations have a number of updated and corrected features. Based solely on its description, I believe the change that corrected the issue was:

Weighting has been added to previously non-weighted characters that would have compared equally.

If anyone has any additional insights, I would be interested to hear them.

Upvotes: 2

Related Questions