jr17
jr17

Reputation: 75

Ordering strings that contain numbers

I have a column named MR which is a varchar. When I run a query with an ORDER BY it doesn't seem to be ordered correctly.

select MR, LName, FName from users 
  where MR between 'MR20001' and 'MR20002' 
  order by MR

Results:

MR20001   | LINA  | MARY
MR200011  | TEST  | CASE
MR20002   | KO    | MIKE

Why does MR200011 show before MR20002?

Upvotes: 0

Views: 186

Answers (3)

Aaron Bertrand
Aaron Bertrand

Reputation: 280625

Because MR is a string, and - for example - 24 sorts lower than 3 since it does not care about the numeric value. This would be like sorting Smith before Azlea because z > m.

If you want to treat just the number as a number, then perhaps don't store the MR prefix. Based on the column name this seems completely redundant. Why not store the numeric portion alone as an INT and create a view that appends the 'MR' at runtime? You can easily do this without really affecting the app (add an instead of trigger if you can't control insert/update operations via a stored procedure):

CREATE VIEW dbo.users_appended
AS
    SELECT MR = 'MR' + CONVERT(VARCHAR(25), MR), 
        MRSort = MR --, ... other columns ...
      FROM dbo.users;
GO

SELECT MR, other columns
  FROM dbo.users_appended
  ORDER BY MRSort;

If you can't change the schema, you can say:

ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));

But I really don't think you should have MR stored in there at all. If you can't change this then maybe consider a view or a computed column that pulls out the numeric part of the string. You can even index a computed column if you are going to commonly order in only one direction.

ALTER TABLE dbo.users ADD MRNumber
  AS (CONVERT(BIGINT, SUBSTRING(MR, 3, 25))) PERSISTED;

CREATE INDEX ix_mrnumber ON dbo.users(MRNumber);

You will have to test if the work required to maintain the computed column and the index are justified by the difference this makes to the query.

A view would be similar, but you wouldn't gain any efficiencies from an index:

CREATE VIEW dbo.users_extended
AS
  SELECT MR, ..., MRNumber = CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
GO

SELECT MR, ... 
  FROM dbo.users_extended
  ORDER BY MRNumber;

As for using LEN instead, be careful. While it is simpler code, it is not necessarily more efficient. On my system I created two tables with a wide distribution of values:

SELECT 'MR'+RTRIM(ABS(object_id)) AS MR 
  INTO dbo.flab 
  FROM sys.all_objects -- 2096 rows

SELECT 'MR'+RTRIM(ABS(s1.object_id)) AS MR 
  INTO dbo.mort 
  FROM sys.all_objects AS s1
  CROSS JOIN sys.all_objects AS s2; -- 4397409 rows

Now, testing simple queries like this:

SELECT * FROM dbo.flab ORDER BY LEN(MR), MR;
SELECT * FROM dbo.flab ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));
SELECT * FROM dbo.mort ORDER BY LEN(MR), MR;
SELECT * FROM dbo.mort ORDER BY CONVERT(BIGINT, SUBSTRING(MR, 3, 25));

Results on a heap (pay close attention to duration and CPU, in spite of the nonsense SQL Server spits out in terms of estimated cost):

enter image description here

And with a clustered index on MR:

enter image description here

I've also changed all my calculations to BIGINT to avoid any potential danger that the substring is more than 12 characters (and still avoids an expensive - yes, expensive - LEN()). Note that the estimated costs are 50/50 and the duration differences are about the same if INT is used instead of BIGINT (assuming that it is safe to use INT - which I think is a safe assumption since the accepted answer would have failed if there were any larger values).

Upvotes: 5

Sebastian Meine
Sebastian Meine

Reputation: 11813

To avoid all the expensive casting and replacing you can use this as an alternative:

select MR, LName, FName 
from Table1 
order by LEN(MR),MR

http://www.sqlfiddle.com/#!3/ae729/6

Upvotes: 0

John Woo
John Woo

Reputation: 263893

Try casting it,

select MR, LName, FName 
from users 
where MR between 'MR20001' and 'MR20002' 
order by CAST(REPLACE(MR, 'MR', '') AS INT)

Upvotes: 1

Related Questions