Reputation: 3388
I have an sqlite table with a few hundred million rows:
sqlite> create table t1(id INTEGER PRIMARY KEY,stuff TEXT );
I need to query this table by its integer primary key hundreds of millions of times. My code:
conn = sqlite3.connect('stuff.db')
with conn:
cur = conn.cursor()
for id in ids:
try:
cur.execute("select stuff from t1 where rowid=?",[id])
stuff_tuple = cur.fetchone()
#do something with the fetched row
except:
pass #for when id is not in t1's key set
Here, ids is a list that may have tens of thousands of elements. Forming t1 did not take very long (ie ~75K inserts per second). Querying t1 the way I've done it is unacceptably slow (ie ~1K queries in 10 seconds).
I am completely new to SQL. What am I doing wrong?
Upvotes: 1
Views: 8178
Reputation: 70552
Since you're retrieving values by their keys, it seems like a key/value store would be more appropriate in this case. Relational databases (Sqlite included) are definitely feature-rich, but you can't beat the performance of a simple key/value store.
There are several to choose from:
And there's many, many more.
Upvotes: 1
Reputation: 14835
You should make one sql call instead, should be must faster
conn = sqlite3.connect('stuff.db')
with conn:
cur = conn.cursor()
for row in cur.execute("SELECT stuff FROM t1 WHERE rowid IN (%s)" % ','.join('?'*len(ids)), ids):
#do something with the fetched row
pass
you do not need a try except since ids not in the db will not show up. If you want to know which ids are not in the results, you can do:
ids_res = set()
for row in c.execute(...):
ids_res.add(row['id'])
ids_not_found = ids_res.symmetric_difference(ids)
Upvotes: 0