Reputation: 75
I have a column named MR
which is a varchar. When I run a query with an ORDER BY
it doesn't seem to be ordered correctly.
select MR, LName, FName from users
where MR between 'MR20001' and 'MR20002'
order by MR
Results:
MR20001 | LINA | MARY
MR200011 | TEST | CASE
MR20002 | KO | MIKE
Why does MR200011
show before MR20002
?
Upvotes: 0
Views: 186
Reputation: 280625
Because MR
is a string, and - for example - 24
sorts lower than 3
since it does not care about the numeric value. This would be like sorting Smith
before Azlea
because z
> m
.
If you want to treat just the number as a number, then perhaps don't store the MR
prefix. Based on the column name this seems completely redundant. Why not store the numeric portion alone as an INT
and create a view that appends the 'MR'
at runtime? You can easily do this without really affecting the app (add an instead of trigger if you can't control insert/update operations via a stored procedure):
CREATE VIEW dbo.users_appended
AS
SELECT MR = 'MR' + CONVERT(VARCHAR(25), MR),
MRSort = MR --, ... other columns ...
FROM dbo.users;
GO
SELECT MR, other columns
FROM dbo.users_appended
ORDER BY MRSort;
If you can't change the schema, you can say:
ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
But I really don't think you should have MR
stored in there at all. If you can't change this then maybe consider a view or a computed column that pulls out the numeric part of the string. You can even index a computed column if you are going to commonly order in only one direction.
ALTER TABLE dbo.users ADD MRNumber
AS (CONVERT(BIGINT, SUBSTRING(MR, 3, 25))) PERSISTED;
CREATE INDEX ix_mrnumber ON dbo.users(MRNumber);
You will have to test if the work required to maintain the computed column and the index are justified by the difference this makes to the query.
A view would be similar, but you wouldn't gain any efficiencies from an index:
CREATE VIEW dbo.users_extended
AS
SELECT MR, ..., MRNumber = CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
GO
SELECT MR, ...
FROM dbo.users_extended
ORDER BY MRNumber;
As for using LEN
instead, be careful. While it is simpler code, it is not necessarily more efficient. On my system I created two tables with a wide distribution of values:
SELECT 'MR'+RTRIM(ABS(object_id)) AS MR
INTO dbo.flab
FROM sys.all_objects -- 2096 rows
SELECT 'MR'+RTRIM(ABS(s1.object_id)) AS MR
INTO dbo.mort
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2; -- 4397409 rows
Now, testing simple queries like this:
SELECT * FROM dbo.flab ORDER BY LEN(MR), MR;
SELECT * FROM dbo.flab ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
SELECT * FROM dbo.mort ORDER BY LEN(MR), MR;
SELECT * FROM dbo.mort ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
Results on a heap (pay close attention to duration and CPU, in spite of the nonsense SQL Server spits out in terms of estimated cost):
And with a clustered index on MR
:
I've also changed all my calculations to BIGINT
to avoid any potential danger that the substring is more than 12 characters (and still avoids an expensive - yes, expensive - LEN()
). Note that the estimated costs are 50/50 and the duration differences are about the same if INT
is used instead of BIGINT
(assuming that it is safe to use INT
- which I think is a safe assumption since the accepted answer would have failed if there were any larger values).
Upvotes: 5
Reputation: 11813
To avoid all the expensive casting and replacing you can use this as an alternative:
select MR, LName, FName
from Table1
order by LEN(MR),MR
http://www.sqlfiddle.com/#!3/ae729/6
Upvotes: 0
Reputation: 263893
Try casting it,
select MR, LName, FName
from users
where MR between 'MR20001' and 'MR20002'
order by CAST(REPLACE(MR, 'MR', '') AS INT)
Upvotes: 1