Reputation: 16991
Does T-SQL have any ability to do custom comparison of strings (for sorting), equivalent to .NET's IComparer
?
Something like the ability to give Order By
a user defined function that takes 2 strings and returns a value representing how they compare (greater than, less that, equal to)?
I currently have a C# ICompararer
implementation that is being used to sort things in code, but now I need to produce this same sorted output from a stored proc.
For reference, this is the IComparer
that I am trying to implement in TSQL.
public class SemanticComparer : IComparer<string>
{
private static Regex _splitter = new Regex("\\W+");
public int Compare(string x, string y)
{
string[] partsX = _splitter.Split(x);
string[] partsY = _splitter.Split(y);
int shortest = Math.Min(partsX.Length, partsY.Length);
for (int index = 0; index < shortest; index++)
{
int intX, intY;
int result;
if (int.TryParse(partsX[index], out intX) && int.TryParse(partsY[index], out intY))
{
result = intX.CompareTo(intY);
}
else
{
result = string.Compare(partsX[index], partsY[index], StringComparison.Ordinal);
}
if (result != 0)
{
return result;
}
}
return 0;
}
}
It would need to be able to sort things that look like this (in order as they should be output):
Where each non-word character splits the string in to segments, tries to compare them numerically if possible, and then as strings if not. There can be any "depth" to the number of segments.
CLR Stored Procedures are, unfortunately, not an option.
Upvotes: 1
Views: 91
Reputation: 17915
with data as (
select c from (values
('9.1'), ('9.2'), ('9.2.1'), ('9.02.2'), ('9.02.3'), ('9.3'), ('9.3.1'), ('9.3.2'),
('10.'), ('11.'), ('11.1'), ('11.2.a'), ('11.2.b'), ('11.2.c'), ('11a.2.a'), ('11b.2.b')
) t(c)
)
select c, '[' +
right('00000' +
substring(c, 1, charindex('.', c + '.0.0', 1) - 1) +
case when substring(c + '.0.0', charindex('.', c + '.0.0', 1) - 1, 1) between '0' and '9' then ' ' else '' end,
6
) + '.' +
right('00000' +
substring(c, charindex('.', c + '.0.0', 1) + 1, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) - charindex('.', c + '.0.0', 1) - 1) +
case when right('0' +
substring(c, charindex('.', c + '.0.0', 1) + 1, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) - charindex('.', c + '.0.0', 1) - 1),
1
) between '0' and '9' then ' ' else '' end,
6
) + '.' +
right('00000' +
substring(c, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) + 1, 10) +
case when right('0' +
substring(c, charindex('.', c + '.0.0', charindex('.', c + '.0.0', 1) + 1) + 1, 10),
1
) between '0' and '9' then ' ' else '' end,
6
) + ']'
from data
order by 2;
I threw this together for fun. As you can see, string parsing in SQL is usually a lot of work and has a lot of repeated subexpressions.
The idea here is that it appears you can transform your values in a normalized format that is sortable via a regular alphabetic sort. The front numeric portion is zero-padded. A single alphabetic character is allowed at the end of each "field", otherwise a space is appended. As one example 11.2.a
becomes [00011 .00002 .00000a]
It's fairly flexible and you could certainly wrap up this logic in a scalar function. I'll let you decide whether this whole thing is a good idea.
http://rextester.com/LYL6977 (Slightly improved?)
Upvotes: 2