sd.
sd.

Reputation: 1

What Microsoft SQL Server collation sorts by ASCII case insensitive?

I'm looking for a collation that sorts by ASCII but case insensitive but I am not having any luck finding one. To be explicit, the order I'm looking for is...

!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
[
\
]
^
_
`
A
a
B
b
C
c
D
d
E
e
F
f
G
g
H
h
I
i
J
j
K
k
L
l
M
m
N
n
O
o
P
p
Q
q
R
r
S
s
T
t
U
u
V
v
W
w
X
x
Y
y
Z
z
{
|
}
~

This would also seem to fit into (for this set of characters) Windows Code Page 1252 but I have to admit, I'm not expert of code pages and could be reading this wrong. https://en.wikipedia.org/wiki/Windows-1252

I've tried a number of collations in queries to test them out... SQL_Latin1_General_CP1_CI_AS
SQL_Latin1_General_CP850_CI_AI
SQL_Latin1_General_CP850_CS_AS
SQL_Latin1_General_CP850_CS_AS
SQL_Latin1_General_CP437_CI_AS

But they are all getting similar results for this set and none of them are what I'm looking for. Some of the Binary sorts (ie SQL_Latin1_General_CP850_BIN) come close but they aren't case insensitive.

I can do it through a query like...

SELECT char_test
FROM code_page_test
ORDER BY ASCII(LOWER (char_test)) 

but is there a collation setting to get this sort order?

Thanks.

Upvotes: -3

Views: 132

Answers (1)

T N
T N

Reputation: 10024

As others have commented, there is no collation that does what you want.

Scroll down to the better option 2 alternative below.

However, you can accomplish this by creating a sort string where each character c is replaced by LOWER(c) + c, and applying COLLATE Latin1_General_BIN. This will effectively sort each character in ASCII order, case insensitive and then sort uppercase before lowercase before moving to the next character.

A variation might be to just replace alphabetic characters (upper and lower) in the manner described above.

SELECT *
FROM Data D
CROSS APPLY (
    SELECT
        CASE WHEN D.String IS NOT NULL 
            THEN STRING_AGG(LOWER(c) + c, '') WITHIN GROUP(ORDER BY S.value)
            END
            COLLATE Latin1_General_BIN
            AS SortString
    FROM GENERATE_SERIES(1, LEN(D.String)) S
    CROSS APPLY(SELECT SUBSTRING(D.String, S.value, 1) AS c) C
)  S
ORDER BY S.SortString

The IS NULL check prevents a null string from sorting the same as an empty or blank string.

The above uses the GENERATE_SERIES() function that is only available in SQL Server 2022 (and later). If you are using an older SQL Server version, you will need to find an alternate number generator. There are many out there, if you search.

From a performance perspective, this is a very expensive operation. You might want to look at adding a persistent computed column to manage the sort-string value.

Sample results:

Code String SortString
0 null null
0
32   
33 ! !!
34 " ""
35 # ##
... ... ...
47 / //
48 0 00
... ... ...
57 9 99
58 : ::
59 ; ;;
60 < <<
... ... ...
96 ` ``
65 A aA
0 AA aAaA
0 Aa aAaa
0 AB aAbB
0 Ab aAbb
97 a aa
0 aA aaaA
0 aa aaaa
0 aB aabB
0 ab aabb
66 B bB
0 BA bBaA
0 Ba bBaa
0 BB bBbB
0 Bb bBbb
98 b bb
0 bA bbaA
0 ba bbaa
0 bB bbbB
0 bb bbbb
67 C cC
... ... ...
90 Z zZ
122 z zz
123 { {{
124 | ||
125 } }}
126 ~ ~~

O ption2

Another approach that may have better performance and will work with older versions of SQL Server back to 2017 would be to use the TRANSLATE() function to replace all characters in the range 'A' through 'z' (including several special characters in the middle of that range), with a rearranged set that will then be sorted in the desired order.

DECLARE @TranslateFrom VARCHAR(100)
      = '[]^_`AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
DECLARE @TranslateTo VARCHAR(100)
      = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz'

SELECT *
FROM Data D
CROSS APPLY (
    SELECT
        TRANSLATE(D.String COLLATE Latin1_General_BIN, @TranslateFrom, @TranslateTo)
            COLLATE Latin1_General_BIN
            AS SortString
) SS
ORDER BY SS.SortString

The translate strings were generated with:

SELECT STRING_AGG(c, '') WITHIN GROUP(ORDER BY LOWER(c), c) AS TranslateFrom
FROM GENERATE_SERIES(ASCII('A'), ASCII('z')) S
CROSS APPLY(SELECT CHAR(S.value) COLLATE Latin1_General_BIN AS c) C

SELECT STRING_AGG(c, '') WITHIN GROUP(ORDER BY c) AS TranslateTo
FROM GENERATE_SERIES(ASCII('A'), ASCII('z')) S
CROSS APPLY(SELECT CHAR(S.value) COLLATE Latin1_General_BIN AS c) C

The generated sort strings may appear unreadable, the the sort works:

Sample results (abbreviated):

Code String SortString
... ... ...
91 [ A
93 ] B
94 ^ C
95 _ D
96 ` E
65 A F
0 AA FF
0 Aa FG
0 AB FH
0 Ab FI
97 a G
0 aA GF
0 aa GG
0 aB GH
0 ab GI
66 B H
0 BA HF
0 Ba HG
0 BB HH
0 Bb HI
98 b I
0 bA IF
0 ba IG
0 bB IH
0 bb II
67 C J
99 c K
... ... ...
88 X u
120 x v
89 Y w
121 y x
90 Z y
122 z z
123 { {
124
125 } }
126 ~ ~

See this db<>fiddle for a demo with the complete results.

Upvotes: 0

Related Questions