dimo414
dimo414

Reputation: 48864

Custom ORDER BY to ignore 'the'

I'm trying to sort a list of titles, but currently there's a giant block of titles which start with 'The '. I'd like the 'The ' to be ignored, and the sort to work off the second word. Is that possible in SQL, or do I have to do custom work on the front end?

For example, current sorting:

Would be better sorted:

Almost as if the records were stored as 'Fountain, The', and the like. But I don't want to store them that way if I can, which is of course the crux of the problem.

Upvotes: 17

Views: 9128

Answers (7)

Aiken
Aiken

Reputation: 11

Ways that will only remove the first The:

=SUBSTITUTE(A1,"The ","",1) OR more reliably: 

=IF(IF(LEFT(A1,4)="The ",TRUE)=TRUE,RIGHT(A1,(LEN(A1)-4)),A1)

Second one is basically saying if the first left digit equals The, then check how many digits are in the cell, and show only the the right hand digits excluding The.

Upvotes: 1

sergey.n
sergey.n

Reputation: 1747

For SQLite

ORDER BY CASE WHEN LOWER(SUBSTR(title,1,4)) = 'the ' THEN SUBSTR(title,5) ELSE title END ASC

Upvotes: 1

great_llama
great_llama

Reputation: 11739

Something like:

ORDER BY IF(LEFT(title,2) = "A ",
            SUBSTRING(title FROM 3),
            IF(LEFT(title,3) = "An ",
              SUBSTRING(title FROM 4),
              IF(LEFT(title,4) = "The ",
                 SUBSTRING(title FROM 5),
                 title)))

But given the overhead of doing this more than a few times, you're really better off storing the title sort value in another column...

Upvotes: 9

Dave W. Smith
Dave W. Smith

Reputation: 24966

You can certainly arrange dynamically strip off 'The', though you'll soon find that you have to deal with 'A' and 'An' (except for the special case of titles like "A is for Alibi"). When "foreign" films enter the mix, you'll need to cope with "El" and "La" (except for that pesky edge case, "LA Story"). Then mix in some German films, and you'll need to cope with 'Der' and 'Die' (except for that pesky set of 'Die Hard' edge cases). See the pattern? You're headed down a path that keeps getting longer and more pitted with special cases.

The way forward on this that avoids an ever-growing set of special cases is to store the title as you want it display and store the title as you want it sorted.

Upvotes: 3

Rob Farley
Rob Farley

Reputation: 15849

Best is to have a computed column to do this, so that you can index the computed column and order by that. Otherwise, the sort will be a lot of work.

So then you can have your computed column as:

CASE WHEN title LIKE 'The %' THEN stuff(title,1,4,'') + ', The' ELSE title END

Edit: If STUFF isn't available in MySQL, then use RIGHT or SUBSTRING to remove the leading 4 characters. But still try to use a computed column if possible, so that indexing can be better. The same logic should be applicable to rip out "A " and "An ".

Rob

Upvotes: 12

OMG Ponies
OMG Ponies

Reputation: 332661

The best way to handle this would be to have a column that contains the value you want to use specifically for ordering output. Then you'd just have to use:

  SELECT t.title
    FROM MOVIES t
ORDER BY t.order_title

There are going to be various rules about what should and should not be used to order titles.

Based on your example, an alternative would be to use something like:

   SELECT t.title
    FROM MOVIES t
ORDER BY SUBSTR(t.title, INSTR(t.title, 'The '))

You could use a CASE statement to contain the various rules.

Upvotes: 3

MusiGenesis
MusiGenesis

Reputation: 75336

I think you could do something like

ORDER BY REPLACE(TITLE, 'The ', '')

although this would replace any occurrence of 'The ' with '', not just the first 'The ', although I don't think this would affect very much.

Upvotes: 5

Related Questions