Reputation: 1027
I am working on an iOS app that uses FMDB to store a list of books with titles that must be sorted alphanumerically (alpha first, numbers second). However, I must also ignore a non-alphanumeric leading character such as an asterisk or a parenthesis. I must also ignore "The". In addition, this sorted data will be displayed on an indexed UITableView so I need to be able get a list of the leading characters and the number of book titles that go into each section.
Example-
This is a list of book titles:
"A Title 1", "A Title 2", "The A Title 3", "B Title 1", "B Title 2", "B Title 3",
"*B Title 4", "C Title 1", "(The) C Title 2", "3 Title 1"
From this list I would count 4 sections to put the titles in ('A', 'B', 'C', '3'). I would also count the number of books that go into each section ('A'-3 books, 'B'-4 books, 'C'-2 books, '3'-1 book). Using this tutorial I was able to set up the sections and indexing but am having trouble dealing with titles like "The A Title 3", "*B Title 4", and "(The) C Title 2".
Currently, this this the statement I am using:
SELECT DISTINCT UPPER(SUBSTR(BookTitle, 1, 1)) AS Section,
COUNT(BookTitle) AS SectionCount
FROM BookTable
GROUP BY Section ORDER BY IFNULL(Section + 1, 0), Section
This returns a list of sections sorted alphanumerically and the number of books that go into each section. The issue is that my 'T' section has far too many books in it because it is getting all books with "The" as the leading characters. Not to mention I am getting sections for special characters like < or * when I would like to ignore these. I am thinking that I need to trim off any non-alphanumeric characters and any occurrence of "The " but I don't know how integrate it into my query as it is now.
I hope this is detailed enough. Please feel free to ask me to clear up anything. As always, thanks in advance.
Upvotes: 3
Views: 2223
Reputation: 1270463
String manipulation in SQL is usually "back-to-basics" when compared to other languages. Here is one approach you might take:
select upper(substring(titleclean, 1, 1)) as Section, count(*)
from (select bt.*,
(case when upper(substring(titletrim, 1, 4)) = 'THE ' or
upper(sutring(titletim, 1, 5)) = 'THE) '
then substring(titletrim, 5, 1000)
else titletrim
end) as titleclean
from (select bt.*, ltrim(title, '()*') as titletrim
from booktable bt
) bt
) bt
group by upper(substring(titleclean, 1, 1))
order by (case when substring(titleclean, 1, 1) between '0' and '9' then 1
else 0
end),
Section
This is doing the string manipulations by assigning variables in subqueries. I think these follow the rules that you want. The innermost ltrim() is removing the initial variables that you do not want.
I changed the expression "ifnull(section+1, 0)" to a clearer range comparison. Your version might work in SQLite. In most databases, it will generate an error for non-numeric sections.
Upvotes: 1
Reputation: 6632
The best option might be to check for these conditions on insert, and put "The" at the end of the title (eg, "A Title 3, The"). You can write a simple program to update the current database and mote "The" to the end.
But if you really want to keep it this way, look into case expressions. Here is a related topic: Does sqlite support any kind of IF(condition) statement in a select
Basically, check the case for "The" or special characters, and use that case in your SUBSTR.
Upvotes: 1