Zoltán Hartai
Zoltán Hartai

Reputation: 53

Strange behaviour of SQL order by when the string consists space

SQL Example 1:

SELECT TestField 
FROM (VALUES('Ne'), ('NE')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Result 1:

Ne  
NE

SQL Example 2 (between NE and a there are 2 spaces, while between Ne and a there is only 1):

SELECT TestField 
FROM (VALUES('Ne a'), ('NE  a')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Result 2:

NE  a  
Ne a

Can someone explain it?

Thanks

Upvotes: 5

Views: 2171

Answers (3)

Bogdan Sahlean
Bogdan Sahlean

Reputation: 1

Second query:

SELECT TestField 
FROM (VALUES
    ('Ne a'), 
    ('NE  a')
--    12345
) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

For case sensitive collations, differences generated by alphabetic order (position 4: < a) is more important than differences generated by case order (position 2: e < E). Thus NE a is before Ne a .

Another example: difference between and a (position 2) is more important than case order (position 1: e vs E):

SELECT '{' + TestField  + '}'
FROM (VALUES
    ('ea'), 
    ('E ') -- or ('E')
--    12
) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS
/*
TestField
---------
{E }
{ea}
*/

More details on Rusanu's blog.

Update #1: You could use SQL_EBCDIC037_CP1_CS_AS collation:

SELECT TestField 
FROM (VALUES
    ('Ne a'), 
    ('NE  a')
--    12345
) AS TestTable(TestField)
ORDER BY TestField COLLATE SQL_EBCDIC037_CP1_CS_AS
/*
TestField
---------
Ne a
NE  a
*/

but this collation could generate some strange results.

Example:

SELECT  x.ColA AS ColA_Latin1_General_CS_AS
FROM (
    SELECT  'A'
    UNION ALL 
    SELECT  'AB'
    UNION ALL 
    SELECT  'ABC'
    UNION ALL   
    SELECT  'zzzz'
) x(ColA)
ORDER BY x.ColA COLLATE Latin1_General_CS_AS
/*
ColA_Latin1_General_CS_AS
----------------------------
A
AB
ABC
zzzz
*/

vs.

SELECT  x.ColA AS ColA_SQL_EBCDIC037_CP1_CS_AS
FROM (
    SELECT  'A'
    UNION ALL 
    SELECT  'AB'
    UNION ALL 
    SELECT  'ABC'
    UNION ALL   
    SELECT  'zzzz'
) x(ColA)
ORDER BY x.ColA COLLATE SQL_EBCDIC037_CP1_CS_AS
/*
ColA_SQL_EBCDIC037_CP1_CS_AS
----------------------------
zzzz
A
AB
ABC
*/

Note: I never used SQL_EBCDIC037_CP1_CS_AS collation and I do not recommend.

Update #2: Text values are splited in two (or more) columns

-- Scenario #1: before/during insert/update, spaces are trimmed with LTRIM
SELECT TestField1 F1,  TestFiel2 AS F2
FROM (VALUES
    ('JOHN', 'ZOE'),  
    ('JOHN', 'Albano')
) AS TestTable(TestField1, TestFiel2)
ORDER BY TestField1 COLLATE Latin1_General_CS_AS, TestFiel2 COLLATE Latin1_General_CS_AS
/*
F1   F2
---- ------
JOHN Albano
JOHN ZOE
*/
-- Scenario #2: during insert/update spaces are not trimmed (with LTRIM)
SELECT LTRIM(TestField1) COLLATE Latin1_General_CS_AS AS F1,  LTRIM(TestFiel2) COLLATE Latin1_General_CS_AS AS F2
FROM (VALUES
    ('JOHN', ' ZOE'),  -- 1 extra space 
    ('JOHN', 'Albano')
) AS TestTable(TestField1, TestFiel2)
ORDER BY F1, F2 
/*
F1   F2
---- ------
JOHN Albano
JOHN ZOE
*/

Note: I would use solution described in Scenario #1.

Upvotes: 3

jtimperley
jtimperley

Reputation: 2544

Strings are sorted by comparing each character based on a modified ASCII table. If the first characters match then the next is compared and so on until an order can be determined. SQL server's Latin1_General_CS_AS considers 'E' and 'e' to be the same character so it moves on until the next character difference which is your space and 'a'. The space is 32 on the ASCII table while 'a' is considered to be 65 (same as 'A' in this scenario). Since the space (32) is less than the 'a' (65), 'Ne a' is sorted before 'NE a'.

http://en.wikipedia.org/wiki/ASCII

Upvotes: 0

marc_s
marc_s

Reputation: 755157

Your second sample has two spaces in the value NE a - therefore, this will be ordered before the Ne a value with just a single space (because the (second) space comes before the a value).

If you reduce the second value to also include just a single space, you'll get the same ordering as in smaple #1:

SELECT TestField 
FROM (VALUES('Ne a'), ('NE a')) AS TestTable(TestField)
ORDER BY TestField COLLATE Latin1_General_CS_AS

Output:

Ne a
NE a

Upvotes: 1

Related Questions