Reputation: 3910
I have a Python 3 program that updates a large list of rows based on their ids (in a table in a Postgres 9.5 database).
I use multiprocessing to speed up the process. As Psycopg's connections can’t be shared across processes, I create a connection for each row, then close it.
Overall, multiprocessing is faster than single processing (5 times faster with 8 CPUs). However, creating a connection is slow: I'd like to create just a few connections and keep them open as long as required.
Since .map() chops ids_list into a number of chunks which it submits to the process pool, would it be possible to share a database connection for all ids in the same chunk/process?
Sample code:
from multiprocessing import Pool
import psycopg2
def create_db_connection():
conn = psycopg2.connect(database=database,
user=user,
password=password,
host=host)
return conn
def my_function(item_id):
conn = create_db_connection()
# Other CPU-intensive operations are done here
cur = conn.cursor()
cur.execute("""
UPDATE table
SET
my_column = 1
WHERE id = %s;
""",
(item_id, ))
cur.close()
conn.commit()
if __name__ == '__main__':
ids_list = [] # Long list of ids
pool = Pool() # os.cpu_count() processes
pool.map(my_function, ids_list)
Thanks for any help you can provide.
Upvotes: 6
Views: 2742
Reputation: 403
You can use the initializer parameter of the Pool constructor. Setup the DB connection in the initializer function. Maybe pass the connection credentials as parameters.
Have a look at the docs: https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool
Upvotes: 1