Reputation: 103
I need to query a table in an SQLite database to return all the rows in a table that match a given set of words.
To be more precise: I have a database with ~80,000 records in it. One of the fields is a text field with around 100-200 words per record. What I want to be able to do is take a list of 200 single word keywords {"apple", "orange", "pear", ... } and retrieve a set of all the records in the table that contain at least one of the keyword terms in the description column.
The immediately obvious way to do this is with something like this:
SELECT stuff FROM table
WHERE (description LIKE '% apple %') or (description LIKE '% orange %') or ...
If I have 200 terms, I end up with a big and nasty looking SQL statement that seems to me to be clumsy, smacks of bad practice, and not surprisingly takes a long time to process - more than a second per 1000 records.
This answer Better performance for SQLite Select Statement seemed close to what I need, and as a result I created an index, but according to http://www.sqlite.org/optoverview.html sqlite doesn't use any optimisations if the LIKE operator is used with a beginning % wildcard.
Not being an SQL expert, I am assuming I'm doing this the dumb way. I was wondering if someone with more experience could suggest a more sensible and perhaps more efficient way of doing this?
Alternatively, is there a better approach I could use to the problem?
Upvotes: 6
Views: 15325
Reputation: 9885
This is the same problem as full-text search, right? In which case, you need some help from the DB to construct indexes into these fields if you want to do this efficiently. A quick search for SQLite full text search yields this page.
The solution you correctly identify as clumsy is probably going to do up to 200 regular expression matches per document in the worst case (i.e. when a document doesn't match), where each match has to traverse the entire field. Using the index approach will mean that your search speed will be independent of the size of each document.
Upvotes: 1
Reputation: 44173
Sounds like you might want to have a look at Full Text Search. It was contributed to SQLite by someone from google. The description:
allows the user to efficiently query the database for all rows that contain one or more words (hereafter "tokens"), even if the table contains many large documents.
Upvotes: 1
Reputation: 24506
Using the SQLite fulltext search would be faster than a LIKE '%...%' query. I don't think there's any database that can use an index for a query beginning with %, as if the database doesn't know what the query starts with then it can't use the index to look it up.
An alternative approach is putting the keywords in a separate table instead, and making an intermediate table that has the information about which row in your main table has which keywords. If you indexed all the relevant columns that way, it could be queried very quickly.
Upvotes: 4