How to find out whether collation uses word sort or string sort?

Question

https://stackoverflow.com/a/361059/14731 discusses the differences between "word sort" and "string sort".

How does one query programmatically when an SQL Collation will use "word sort" vs "string sort"?

Corollary: Do all collations use "word sort" for Unicode strings and "string sort" for non-Unicode strings?

SELECT * from sys.fn_HelpCollations()
WHERE name = 'SQL_Latin1_General_CP1_CI_AS'

provides a lot of details about the collation, but notice that it makes no mention of "word sort".

Gili · Accepted Answer

srutzky's excellent answer reveals that, with the exception of non-Unicode types processed by SQL_ collators, all other data is sorted according to "Unicode Collation" rules.
Confusingly, Microsoft does not use the Unicode standard's sorting rules.
According to https://support.microsoft.com/en-us/kb/322112
SQL Server 2000 supports two types of collations:
- SQL collations
- Windows collations
[...]

For a Windows collation, a comparison of non-Unicode data is implemented by using the same algorithm as Unicode data.

[...]

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules.
I interpret this as meaning that:
- SQL_ collators are "SQL collations"
- All other collators are "Windows collators".
- With the exception of non-Unicode types processed by SQL_ collators, all other data is sorted according to "Windows collations".

So, let's dig into "Windows collations".

According to https://msdn.microsoft.com/en-us/library/ms143515(v=sql.105).aspx

For Unicode data types, data comparisons are based on the Unicode code points.
winnls.h contains a brief overview of "word sort":

//  Sorting Flags.
//
//    WORD Sort:    culturally correct sort
//                  hyphen and apostrophe are special cased
//                  example: “coop” and “co-op” will sort together in a list
//
//                        co_op     <——-  underscore (symbol)
//                        coat
//                        comb
//                        coop
//                        co-op     <——-  hyphen (punctuation)
//                        cork
//                        went
//                        were
//                        we’re     <——-  apostrophe (punctuation)
//
//
//    STRING Sort:  hyphen and apostrophe will sort with all other symbols
//
//                        co-op     <——-  hyphen (punctuation)
//                        co_op     <——-  underscore (symbol)
//                        coat
//                        comb
//                        coop
//                        cork
//                        we’re     <——-  apostrophe (punctuation)
//                        went
//                        were

And finally, according to https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx

[...] all punctuation marks and other nonalphanumeric characters, except for the hyphen and the apostrophe, come before any alphanumeric character. The hyphen and the apostrophe are treated differently from the other nonalphanumeric characters to ensure that words such as "coop" and "co-op" stay together in a sorted list.

How to find out whether collation uses word sort or string sort?

Answers (2)

Related Questions