Alex
Alex

Reputation: 44749

Large Sqlite database search

How is it possible to implement an efficient large Sqlite db search (more than 90000 entries)?

I'm using Python and SQLObject ORM:

    import re
    ...

    def search1():
        cr = re.compile(ur'foo')
        for item in Item.select():
            if cr.search(item.name) or cr.search(item.skim):
                print item.name

This function runs in more than 30 seconds. How should I make it run faster?

UPD: The test:

    for item in Item.select():
        pass

... takes almost the same time as my initial function (0:00:33.093141 to 0:00:33.322414). So the regexps eat no time.

A Sqlite3 shell query:

    select '' from item where name like '%foo%';

runs in about a second. So the main time consumption happens due to the inefficient ORM's data retrieval from db. I guess SQLObject grabs entire rows here, while Sqlite touches only necessary fields.

Upvotes: 2

Views: 1941

Answers (6)

pboucher
pboucher

Reputation: 352

Given your example and expanding on Reed's answer your code could look a bit like the following:

import re
import sqlalchemy.sql.expression as expr

...

def search1():
    searchStr = ur'foo'
    whereClause = expr.or_(itemsTable.c.nameColumn.contains(searchStr), itemsTable.c.skimColumn.contains(searchStr))
    for item in Items.select().where(whereClause):
        print item.name

which translates to

SELECT * FROM items WHERE name LIKE '%foo%' or skim LIKE '%foo%'

This will have the database do all the filtering work for you instead of fetching all 90000 records and doing possibly two regex operations on each record.

You can find some info here on the .contains() method here.

As well as the SQLAlchemy SQL Expression Language Tutorial here.

Of course the example above assumes variable names for your itemsTable and the column it has (nameColumn and skimColumn).

Upvotes: 0

code43
code43

Reputation: 11

Initially doing regex via Python was considered for y_serial, but that was dropped in favor of SQLite's GLOB (which is far faster). GLOB is similar to LIKE except that it's syntax is more conventional: * instead of %, ? instead of _ .

See the Endnotes at http://yserial.sourceforge.net/ for more details.

Upvotes: 0

van
van

Reputation: 77082

I would definitely take a suggestion of Reed to pass the filter to the SQL (forget the index part though).

I do not think that selecting only specified fields or all fields make a difference (unless you do have a lot of large fields). I would bet that SQLObject creates/instanciates 80K objects and puts them into a Session/UnitOfWork for tracking. This could definitely take some time.

Also if you do not need objects in your session, there must be a way to select just what the fields you need using custom-query creation so that no Item objects are created, but only tuples.

Upvotes: 0

S.Lott
S.Lott

Reputation: 392010

30 seconds to fetch 90,000 rows might not be all that bad.

Have you benchmarked the time required to do the following?

    for item in Item.select():
        pass

Just to see if the time is DB time, network time or application time?

If your SQLite DB is physically very large, you could be looking at -- simply -- a lot of physical I/O to read all that database stuff in.

Upvotes: 2

Christopher
Christopher

Reputation: 9094

If you really need to use a regular expression, there's not really anything you can do to speed that up tremendously.

The best thing would be to write an sqlite function that performs the comparison for you in the db engine, instead of Python.

You could also switch to a db server like postgresql that has support for SIMILAR.

http://www.postgresql.org/docs/8.3/static/functions-matching.html

Upvotes: 0

Reed Copsey
Reed Copsey

Reputation: 564851

The best way would be to rework your logic to do the selection in the database instead of in your python program.

Instead of doing Item.select(), you should rework it to do Item.select("""name LIKE ....

If you do this, and make sure you have the name and skim columns indexed, it will return very quickly. 90000 entries is not a large database.

Upvotes: 3

Related Questions