Reputation: 50970
I am trying to write a multi-threaded Python application in which a single SQlite connection is shared among threads. I am unable to get this to work. The real application is a cherrypy web server, but the following simple code demonstrates my problem.
What change or changes to I need to make to run the sample code, below, successfully?
When I run this program with THREAD_COUNT set to 1 it works fine and my database is updated as I expect (that is, letter "X" is added to the text value in the SectorGroup column).
When I run it with THREAD_COUNT set to anything higher than 1, all threads but 1 terminate prematurely with SQLite related exceptions. Different threads throw different exceptions (with no discernible pattern) including:
OperationalError: cannot start a transaction within a transaction
(occurs on the UPDATE
statement)
OperationalError: cannot commit - no transaction is active
(occurs on the .commit() call)
InterfaceError: Error binding parameter 0 - probably unsupported type.
(occurs on the UPDATE
and the SELECT
statements)
IndexError: tuple index out of range
(this one has me completely puzzled, it occurs on the statement group = rows[0][0] or ''
, but only when multiple threads are running)
Here is the code:
CONNECTION = sqlite3.connect('./database/mydb', detect_types=sqlite3.PARSE_DECLTYPES, check_same_thread = False)
CONNECTION.row_factory = sqlite3.Row
def commands(start_id):
# loop over 100 records, read the SectorGroup column, and write it back with "X" appended.
for inv_id in range(start_id, start_id + 100):
rows = CONNECTION.execute('SELECT SectorGroup FROM Investment WHERE InvestmentID = ?;', [inv_id]).fetchall()
if rows:
group = rows[0][0] or ''
msg = '{} inv {} = {}'.format(current_thread().name, inv_id, group)
print msg
CONNECTION.execute('UPDATE Investment SET SectorGroup = ? WHERE InvestmentID = ?;', [group + 'X', inv_id])
CONNECTION.commit()
if __name__ == '__main__':
THREAD_COUNT = 10
for i in range(THREAD_COUNT):
t = Thread(target=commands, args=(i*100,))
t.start()
Upvotes: 16
Views: 37897
Reputation: 12887
I'm guessing here, but it looks like the reason why you are doing this is a performance concern.
Python threads aren't performant in any meaningful way for this use case. Instead, use sqlite transactions, which are super fast.
If you do all your updates in a transaction, you'll find an order of magnitude speedup.
Upvotes: 0
Reputation: 1121744
It's not safe to share a connection between threads; at the very least you need to use a lock to serialize access. Do also read http://docs.python.org/2/library/sqlite3.html#multithreading as older SQLite versions have more issues still.
The check_same_thread
option appears deliberately under-documented in that respect, see http://bugs.python.org/issue16509.
You could use a connection per thread instead, or look to SQLAlchemy for a connection pool (and a very efficient statement-of-work and queuing system to boot).
Upvotes: 19
Reputation: 6990
I ran into the SqLite threading problem when writing a simple WSGI server for fun and learning. WSGI is multi-threaded by nature when running under Apache. The following code seems to work for me:
import sqlite3
import threading
class LockableCursor:
def __init__ (self, cursor):
self.cursor = cursor
self.lock = threading.Lock ()
def execute (self, arg0, arg1 = None):
self.lock.acquire ()
try:
self.cursor.execute (arg1 if arg1 else arg0)
if arg1:
if arg0 == 'all':
result = self.cursor.fetchall ()
elif arg0 == 'one':
result = self.cursor.fetchone ()
except Exception as exception:
raise exception
finally:
self.lock.release ()
if arg1:
return result
def dictFactory (cursor, row):
aDict = {}
for iField, field in enumerate (cursor.description):
aDict [field [0]] = row [iField]
return aDict
class Db:
def __init__ (self, app):
self.app = app
def connect (self):
self.connection = sqlite3.connect (self.app.dbFileName, check_same_thread = False, isolation_level = None) # Will create db if nonexistent
self.connection.row_factory = dictFactory
self.cs = LockableCursor (self.connection.cursor ())
Example of use:
if not ok and self.user: # Not logged out
# Get role data for any later use
userIdsRoleIds = self.cs.execute ('all', 'SELECT role_id FROM users_roles WHERE user_id == {}'.format (self.user ['id']))
for userIdRoleId in userIdsRoleIds:
self.userRoles.append (self.cs.execute ('one', 'SELECT name FROM roles WHERE id == {}'.format (userIdRoleId ['role_id'])))
Another example:
self.cs.execute ('CREATE TABLE users (id INTEGER PRIMARY KEY, email_address, password, token)')
self.cs.execute ('INSERT INTO users (email_address, password) VALUES ("{}", "{}")'.format (self.app.defaultUserEmailAddress, self.app.defaultUserPassword))
# Create roles table and insert default role
self.cs.execute ('CREATE TABLE roles (id INTEGER PRIMARY KEY, name)')
self.cs.execute ('INSERT INTO roles (name) VALUES ("{}")'.format (self.app.defaultRoleName))
# Create users_roles table and assign default role to default user
self.cs.execute ('CREATE TABLE users_roles (id INTEGER PRIMARY KEY, user_id, role_id)')
defaultUserId = self.cs.execute ('one', 'SELECT id FROM users WHERE email_address = "{}"'.format (self.app.defaultUserEmailAddress)) ['id']
defaultRoleId = self.cs.execute ('one', 'SELECT id FROM roles WHERE name = "{}"'.format (self.app.defaultRoleName)) ['id']
self.cs.execute ('INSERT INTO users_roles (user_id, role_id) VALUES ({}, {})'.format (defaultUserId, defaultRoleId))
Complete program using this construction downloadable at: http://www.josmith.org/
N.B. The code above is experimental, there may be (fundamental) issues when using this with (many) concurrent requests (e.g. as part of a WSGI server). Performance is not critical for my application. The simplest thing probably would have been to just use MySql, but I like to experiment a little, and the zero installation thing about SqLite appealed to me. If anyone thinks the code above is fundamentally flawed, please react, as my purpose is to learn. If not, I hope this is useful for others.
Upvotes: 7