Squatch
Squatch

Reputation: 1027

SQLite - Sort alphanumerically while ignoring special characters and "The"

I am working on an iOS app that uses FMDB to store a list of books with titles that must be sorted alphanumerically (alpha first, numbers second). However, I must also ignore a non-alphanumeric leading character such as an asterisk or a parenthesis. I must also ignore "The". In addition, this sorted data will be displayed on an indexed UITableView so I need to be able get a list of the leading characters and the number of book titles that go into each section.

Example-

This is a list of book titles:

"A Title 1", "A Title 2", "The A Title 3", "B Title 1", "B Title 2", "B Title 3", 
"*B Title 4", "C Title 1", "(The) C Title 2", "3 Title 1"

From this list I would count 4 sections to put the titles in ('A', 'B', 'C', '3'). I would also count the number of books that go into each section ('A'-3 books, 'B'-4 books, 'C'-2 books, '3'-1 book). Using this tutorial I was able to set up the sections and indexing but am having trouble dealing with titles like "The A Title 3", "*B Title 4", and "(The) C Title 2".

Currently, this this the statement I am using:

SELECT DISTINCT UPPER(SUBSTR(BookTitle, 1, 1)) AS Section, 
COUNT(BookTitle) AS SectionCount 
FROM BookTable 
GROUP BY Section ORDER BY IFNULL(Section + 1, 0), Section

This returns a list of sections sorted alphanumerically and the number of books that go into each section. The issue is that my 'T' section has far too many books in it because it is getting all books with "The" as the leading characters. Not to mention I am getting sections for special characters like < or * when I would like to ignore these. I am thinking that I need to trim off any non-alphanumeric characters and any occurrence of "The " but I don't know how integrate it into my query as it is now.

I hope this is detailed enough. Please feel free to ask me to clear up anything. As always, thanks in advance.

Upvotes: 3

Views: 2223

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270463

String manipulation in SQL is usually "back-to-basics" when compared to other languages. Here is one approach you might take:

select upper(substring(titleclean, 1, 1)) as Section, count(*)
from (select bt.*,
             (case when upper(substring(titletrim, 1, 4)) = 'THE ' or
                        upper(sutring(titletim, 1, 5)) = 'THE) '
                   then substring(titletrim, 5, 1000)
                   else titletrim
              end) as titleclean
      from (select bt.*, ltrim(title, '()*') as titletrim
            from booktable bt
           ) bt
     ) bt
group by upper(substring(titleclean, 1, 1))
order by (case when substring(titleclean, 1, 1) between '0' and '9' then 1
               else 0
          end),
         Section

This is doing the string manipulations by assigning variables in subqueries. I think these follow the rules that you want. The innermost ltrim() is removing the initial variables that you do not want.

I changed the expression "ifnull(section+1, 0)" to a clearer range comparison. Your version might work in SQLite. In most databases, it will generate an error for non-numeric sections.

Upvotes: 1

cegfault
cegfault

Reputation: 6632

The best option might be to check for these conditions on insert, and put "The" at the end of the title (eg, "A Title 3, The"). You can write a simple program to update the current database and mote "The" to the end.

But if you really want to keep it this way, look into case expressions. Here is a related topic: Does sqlite support any kind of IF(condition) statement in a select

Basically, check the case for "The" or special characters, and use that case in your SUBSTR.

Upvotes: 1

Related Questions