Reputation: 1
I'm looking for a collation that sorts by ASCII but case insensitive but I am not having any luck finding one. To be explicit, the order I'm looking for is...
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
[
\
]
^
_
`
A
a
B
b
C
c
D
d
E
e
F
f
G
g
H
h
I
i
J
j
K
k
L
l
M
m
N
n
O
o
P
p
Q
q
R
r
S
s
T
t
U
u
V
v
W
w
X
x
Y
y
Z
z
{
|
}
~
This would also seem to fit into (for this set of characters) Windows Code Page 1252 but I have to admit, I'm not expert of code pages and could be reading this wrong. https://en.wikipedia.org/wiki/Windows-1252
I've tried a number of collations in queries to test them out...
SQL_Latin1_General_CP1_CI_AS
SQL_Latin1_General_CP850_CI_AI
SQL_Latin1_General_CP850_CS_AS
SQL_Latin1_General_CP850_CS_AS
SQL_Latin1_General_CP437_CI_AS
But they are all getting similar results for this set and none of them are what I'm looking for. Some of the Binary sorts (ie SQL_Latin1_General_CP850_BIN
) come close but they aren't case insensitive.
I can do it through a query like...
SELECT char_test
FROM code_page_test
ORDER BY ASCII(LOWER (char_test))
but is there a collation setting to get this sort order?
Thanks.
Upvotes: -3
Views: 132
Reputation: 10024
As others have commented, there is no collation that does what you want.
Scroll down to the better option 2 alternative below.
However, you can accomplish this by creating a sort string where each character c
is replaced by LOWER(c) + c
, and applying COLLATE Latin1_General_BIN
. This will effectively sort each character in ASCII order, case insensitive and then sort uppercase before lowercase before moving to the next character.
A variation might be to just replace alphabetic characters (upper and lower) in the manner described above.
SELECT *
FROM Data D
CROSS APPLY (
SELECT
CASE WHEN D.String IS NOT NULL
THEN STRING_AGG(LOWER(c) + c, '') WITHIN GROUP(ORDER BY S.value)
END
COLLATE Latin1_General_BIN
AS SortString
FROM GENERATE_SERIES(1, LEN(D.String)) S
CROSS APPLY(SELECT SUBSTRING(D.String, S.value, 1) AS c) C
) S
ORDER BY S.SortString
The IS NULL
check prevents a null string from sorting the same as an empty or blank string.
The above uses the GENERATE_SERIES()
function that is only available in SQL Server 2022 (and later). If you are using an older SQL Server version, you will need to find an alternate number generator. There are many out there, if you search.
From a performance perspective, this is a very expensive operation. You might want to look at adding a persistent computed column to manage the sort-string value.
Sample results:
Code | String | SortString |
---|---|---|
0 | null | null |
0 | ||
32 | ||
33 | ! | !! |
34 | " | "" |
35 | # | ## |
... | ... | ... |
47 | / | // |
48 | 0 | 00 |
... | ... | ... |
57 | 9 | 99 |
58 | : | :: |
59 | ; | ;; |
60 | < | << |
... | ... | ... |
96 | ` | `` |
65 | A | aA |
0 | AA | aAaA |
0 | Aa | aAaa |
0 | AB | aAbB |
0 | Ab | aAbb |
97 | a | aa |
0 | aA | aaaA |
0 | aa | aaaa |
0 | aB | aabB |
0 | ab | aabb |
66 | B | bB |
0 | BA | bBaA |
0 | Ba | bBaa |
0 | BB | bBbB |
0 | Bb | bBbb |
98 | b | bb |
0 | bA | bbaA |
0 | ba | bbaa |
0 | bB | bbbB |
0 | bb | bbbb |
67 | C | cC |
... | ... | ... |
90 | Z | zZ |
122 | z | zz |
123 | { | {{ |
124 | | | || |
125 | } | }} |
126 | ~ | ~~ |
O ption2
Another approach that may have better performance and will work with older versions of SQL Server back to 2017 would be to use the TRANSLATE()
function to replace all characters in the range 'A' through 'z' (including several special characters in the middle of that range), with a rearranged set that will then be sorted in the desired order.
DECLARE @TranslateFrom VARCHAR(100)
= '[]^_`AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz'
DECLARE @TranslateTo VARCHAR(100)
= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz'
SELECT *
FROM Data D
CROSS APPLY (
SELECT
TRANSLATE(D.String COLLATE Latin1_General_BIN, @TranslateFrom, @TranslateTo)
COLLATE Latin1_General_BIN
AS SortString
) SS
ORDER BY SS.SortString
The translate strings were generated with:
SELECT STRING_AGG(c, '') WITHIN GROUP(ORDER BY LOWER(c), c) AS TranslateFrom
FROM GENERATE_SERIES(ASCII('A'), ASCII('z')) S
CROSS APPLY(SELECT CHAR(S.value) COLLATE Latin1_General_BIN AS c) C
SELECT STRING_AGG(c, '') WITHIN GROUP(ORDER BY c) AS TranslateTo
FROM GENERATE_SERIES(ASCII('A'), ASCII('z')) S
CROSS APPLY(SELECT CHAR(S.value) COLLATE Latin1_General_BIN AS c) C
The generated sort strings may appear unreadable, the the sort works:
Sample results (abbreviated):
Code | String | SortString | |
---|---|---|---|
... | ... | ... | |
91 | [ | A | |
93 | ] | B | |
94 | ^ | C | |
95 | _ | D | |
96 | ` | E | |
65 | A | F | |
0 | AA | FF | |
0 | Aa | FG | |
0 | AB | FH | |
0 | Ab | FI | |
97 | a | G | |
0 | aA | GF | |
0 | aa | GG | |
0 | aB | GH | |
0 | ab | GI | |
66 | B | H | |
0 | BA | HF | |
0 | Ba | HG | |
0 | BB | HH | |
0 | Bb | HI | |
98 | b | I | |
0 | bA | IF | |
0 | ba | IG | |
0 | bB | IH | |
0 | bb | II | |
67 | C | J | |
99 | c | K | |
... | ... | ... | |
88 | X | u | |
120 | x | v | |
89 | Y | w | |
121 | y | x | |
90 | Z | y | |
122 | z | z | |
123 | { | { | |
124 | |||
125 | } | } | |
126 | ~ | ~ |
See this db<>fiddle for a demo with the complete results.
Upvotes: 0