dranxo
dranxo

Reputation: 3388

How do I speed up my sqlite3 queries in Python?

I have an sqlite table with a few hundred million rows:

sqlite> create table t1(id INTEGER PRIMARY KEY,stuff TEXT );

I need to query this table by its integer primary key hundreds of millions of times. My code:

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()
    for id in ids:
        try:
            cur.execute("select stuff from t1 where rowid=?",[id])
            stuff_tuple = cur.fetchone()
            #do something with the fetched row
        except:
            pass #for when id is not in t1's key set

Here, ids is a list that may have tens of thousands of elements. Forming t1 did not take very long (ie ~75K inserts per second). Querying t1 the way I've done it is unacceptably slow (ie ~1K queries in 10 seconds).

I am completely new to SQL. What am I doing wrong?

Upvotes: 1

Views: 8178

Answers (2)

voithos
voithos

Reputation: 70552

Since you're retrieving values by their keys, it seems like a key/value store would be more appropriate in this case. Relational databases (Sqlite included) are definitely feature-rich, but you can't beat the performance of a simple key/value store.

There are several to choose from:

  • Redis: "advanced key-value store", very fast, optimized for in-memory operation
  • Cassandra: extremely high performance, scalable, used by multiple high-profile sites
  • MongoDB: feature-rich, tries to be "middle ground" between relational and NoSQL (and they've started offering free online classes)

And there's many, many more.

Upvotes: 1

Pykler
Pykler

Reputation: 14835

You should make one sql call instead, should be must faster

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()

    for row in cur.execute("SELECT stuff FROM t1 WHERE rowid IN (%s)" % ','.join('?'*len(ids)), ids):
        #do something with the fetched row
        pass 

you do not need a try except since ids not in the db will not show up. If you want to know which ids are not in the results, you can do:

ids_res = set()
for row in c.execute(...):
    ids_res.add(row['id'])
ids_not_found = ids_res.symmetric_difference(ids)

Upvotes: 0

Related Questions