Reputation: 4887
Does anybody know what the WS property of a collation does? Does it have anything to do with Asian type of scripts? The MSDN docs explain it to be "Width Sensitive", but that doesn't make any sense for say Swedish, or English...?
Upvotes: 3
Views: 2022
Reputation: 453278
The accepted answer demonstrates that it does not come into play for the comparison N'a' = 'a'
. This is easily explained because the char
will get implicitly converted to nchar
in the comparison between the two anyway so both strings in the comparison are Unicode.
I just thought of an example of a place where width sensitivity might be expected to come into play in a Latin Collation only to discover that it appeared to make no difference at all there either...
DECLARE @T TABLE (
a VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS,
b VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS )
INSERT INTO @T
VALUES (N'Æ',
N'AE');
SELECT LEN(a) AS [LEN(a)],
LEN(b) AS [LEN(b)],
a,
b,
CASE
WHEN a = b THEN 'Y'
ELSE 'N'
END AS [a=b]
FROM @T
LEN(a) LEN(b) a b a=b
----------- ----------- ---- ---- ----
1 2 Æ AE Y
The Book "Microsoft SQL Server 2008 Internals" has this to say.
Width Sensitivity refers to East Asian languages for which there exists both half-width and full-width forms of some characters.
There is absolutely nothing stopping you storing these characters in a collation such as Latin1_General_100_CS_AS_WS
as long as the column has a unicode data type so I guess that the WS
part would only apply in that particular situation.
Upvotes: 1
Reputation: 8726
A good description of width sensitivity is summarized here: http://www.databasejournal.com/features/mssql/article.php/3302341/SQL-Server-and-Collation.htm
Width sensitivity
When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.
Perhaps from an English character perspective, I would theorize that a width-sensitive collation would mean that 'abc' <> N'abc', because one string is a Unicode string (2 bytes per character), whereas the other one byte per character.
From a Latin characterset perspective it seems like something that wouldn't make sense to set. Perhaps in other languages this is important.
I try to set these types of collation properties to insensitive in general in order to avoid weird things like records not getting returned in search results. I usually keep accents set to insensitive, since that can cause a lot of user search headaches, depending on the audience of your applications.
Edit: After creating a test database with the Latin1_General_CS_AS_WS collation, I found that the N'a' = N'A' is actually true. Test queries were:
select case when 'a' = 'A' then 'yes' else 'no' end
select case when 'a' = 'a' then 'yes' else 'no' end
select case when N'a' = 'a' then 'yes' else 'no' end
So in practice I'm not sure where this type of rule comes into play
Upvotes: 5