Gahiggidy
Gahiggidy

Reputation: 98

SQL: Order by substring of mixed letters and numbers

How might I select the following data in an existing table and order by a mix of letters and numbers. Here is the sample...

A-1
A-10
A-2
A-3
A-4
A-5
A-6
A-7
A-8
A-9
A-3a
A-3b
A-3c
B-1
B-10
B-11
B-12
B-12a
B-12b
B-13
B-2
B-3
B-4
B-5
B-6
B-7
B-8
B-9

Upvotes: 1

Views: 306

Answers (3)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67301

I place this as a new answer, as it is not really an answer but rather a comparison of different approaches:

The conclusio:

  • all approaches scale fairly linear, except XML
  • XML is fastest with small row count but gets worse with high row count

Create a test scenario

CREATE TABLE #tbl (ID INT IDENTITY,sortColumn VARCHAR(100));
INSERT INTO #tbl VALUES
 ('A-1')
,('A-10')
,('A-2')
,('A-3')
,('A-4')
,('A-5')
,('A-6')
,('A-7')
,('A-8')
,('A-9')
,('A-3a')
,('A-3b')
,('A-3c')
,('B-1')
,('B-10')
,('B-11')
,('B-12')
,('B-12a')
,('B-12b')
,('B-13')
,('B-2')
,('B-3')
,('B-4')
,('B-5')
,('B-6')
,('B-7')
,('B-8')
,('A-8a')
,('B-8')
,('B-9'); --30 rows
GO 1000  -- x 1.000 = 30.000 rows

Matt's approach (cleaned to the necessary)

  • 46 seconds on 3 mio rows
  • 4.5 seconds on 300.000 rows
  • 1.3 seconds on 30.000 rows
  • 0.7 seconds on 3.000 rows

The code

SELECT ID,sortColumn
FROM
    #tbl
ORDER BY
LEFT(sortColumn,CHARINDEX('-',sortColumn) -1)
,CAST((CASE
    WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,3)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,3)
    WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,2)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,2)
    WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,1)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,1)
    ELSE NULL
END) AS INT)
,RIGHT(sortColumn,
    LEN(sortColumn) - 
    LEN(LEFT(sortColumn,CHARINDEX('-',sortColumn) -1)) 
    - LEN(CASE
            WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,3)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,3)
            WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,2)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,2)
            WHEN ISNUMERIC(SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,1)) = 1 THEN SUBSTRING(sortColumn,CHARINDEX('-',sortColumn) + 1,1)
            ELSE NULL
    END)
    - 1 --the '-'
),ID;

Stepwise calculation in CROSS APPLYs, sorting on calculated columns

  • 44 seconds on 3 mio rows
  • 4.4 seconds on 300.000 rows
  • 0.9 seconds on 30.000 rows
  • 0.3 seconds on 3.000 rows

The code

SELECT ID,sortColumn
FROM #tbl
CROSS APPLY(SELECT CHARINDEX('-',sortColumn) AS posMinus) AS pos
CROSS APPLY(SELECT SUBSTRING(sortColumn,1,posMinus-1) AS part1
                  ,SUBSTRING(sortColumn,posMinus+1,1000) AS part2
            ) AS parts
CROSS APPLY(SELECT ISNUMERIC(part2) AS p2isnum) AS checknum
CROSS APPLY(SELECT CASE WHEN p2isnum=1 THEN '' ELSE RIGHT(part2,1) END AS part3
                  ,CASE WHEN p2isnum=1 THEN part2 ELSE SUBSTRING(part2,1,LEN(part2)-1) END AS part2New
           ) AS partsNew
ORDER BY part1,part2new,part3,ID;

Stepwise calculation in CROSS APPLYs, sorting on concatenated padded string

  • 42 seconds on 3 mio rows
  • 4.2 seconds on 300.000 rows
  • 0.7 seconds on 30.000 rows
  • 0.4 seconds on 3.000 rows

The code

SELECT ID,sortColumn
FROM #tbl
CROSS APPLY(SELECT CHARINDEX('-',sortColumn) AS posMinus) AS pos
CROSS APPLY(SELECT SUBSTRING(sortColumn,1,posMinus-1) AS part1
                  ,SUBSTRING(sortColumn,posMinus+1,1000) AS part2
            ) AS parts
ORDER BY RIGHT('.....' + part1,5) + RIGHT('.....' + part2,5 - ISNUMERIC(RIGHT(part2,1)))
        ,ID;

Splitting with XML, sorting on concatenated padded string

  • 67 seconds on 3 mio rows
  • 6.2 seconds on 300.000 rows
  • 0.7 seconds on 30.000 rows
  • 0.3 seconds on 3.000 rows

The code

SELECT ID,sortColumn
FROM
(
    SELECT CAST('<r>' + REPLACE(sortColumn,'-','</r><r>') + '</r>' AS XML) AS SortColumnSplitted
          ,*
    FROM #tbl
) AS tbl
ORDER BY RIGHT('.....' + SortColumnSplitted.value('r[1]','varchar(max)'),5) + RIGHT('.....' + SortColumnSplitted.value('r[2]','varchar(max)'),5 - ISNUMERIC(RIGHT(SortColumnSplitted.value('r[2]','varchar(max)'),1)))
        ,ID;

Upvotes: 2

Thomas
Thomas

Reputation: 31

The most powerful solution is to create an SQL CLR function. That's a bit tough, though.

Another approach is writing an insert/update trigger that splits the value in the mixed column with TSQL and stores the three parts (character, number, character) in specific helper columns (that you can use to sort). Based on your examples, you can experiment with the splitting along the lines of this code:

declare @value nvarchar(10) = 'B-12b';

-- first part
select substring(@value, 1, 1)

-- second part
select case when isnumeric(right(@value, 1)) = 1
    then substring(@value, 3, len(@value) - 2)
    else substring(@value, 3, len(@value) - 3)
    end

-- third part
select case when isnumeric(right(@value, 1)) = 1
    then '_'
    else right(@value, 1)
    end

Upvotes: 1

Matt
Matt

Reputation: 14341

I agree with thomas, but I also have a lot of the .Net Regex and String functions exposed via CLR. Other techniques we use a little are user defined functions that recursively go character by character to strip out non desired characters (e.g. no alpha when looking for a number, no number when looking for alpha). But in the particular case you presented if you know the format will be pretty standard you can use a combination of ISNUMERIC, SUBSTRINGS, etc. to reach your goal pretty easily. For example. If you know it is always: Alpha + "-" + Numeric (1-3 digits) + alpha you could do the following and it will sort The alpha as alpha, numeric as numeric, and alpha as alpha.

DECLARE @Values AS TABLE (Value VARCHAR(5))

INSERT INTO @Values (Value)
    VALUES ('A-1')
    ,('A-10')
    ,('A-2')
    ,('A-3')
    ,('A-4')
    ,('A-5')
    ,('A-6')
    ,('A-7')
    ,('A-8')
    ,('A-9')
    ,('A-3a')
    ,('A-3b')
    ,('A-3c')
    ,('B-1')
    ,('B-10')
    ,('B-11')
    ,('B-12')
    ,('B-12a')
    ,('B-12b')
    ,('B-13')
    ,('B-2')
    ,('B-3')
    ,('B-4')
    ,('B-5')
    ,('B-6')
    ,('B-7')
    ,('B-8')
    ,('B-9')

SELECT
    *
    ,FirstAlphaSection = LEFT(Value,CHARINDEX('-',Value) -1)
    ,SecondNumericSection = CASE
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)
       ELSE NULL
    END
    ,ThirdAlphaSection =
       RIGHT(Value,
          LEN(Value) - 
          LEN(LEFT(Value,CHARINDEX('-',Value) -1)) 
          - LEN(CASE
                WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)
                WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)
                WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)
                ELSE NULL
          END)
          - 1 --the '-'
       )
FROM
    @Values
ORDER BY
    LEFT(Value,CHARINDEX('-',Value) -1)
    ,CAST((CASE
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)
       WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)
       ELSE NULL
    END) AS INT)
    ,RIGHT(Value,
       LEN(Value) - 
       LEN(LEFT(Value,CHARINDEX('-',Value) -1)) 
       - LEN(CASE
             WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,3)
             WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,2)
             WHEN ISNUMERIC(SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)) = 1 THEN SUBSTRING(Value,CHARINDEX('-',Value) + 1,1)
             ELSE NULL
       END)
       - 1 --the '-'
    )

Upvotes: 1

Related Questions