Antoine Dusséaux
Antoine Dusséaux

Reputation: 3910

Share DB connection in a process pool

I have a Python 3 program that updates a large list of rows based on their ids (in a table in a Postgres 9.5 database).

I use multiprocessing to speed up the process. As Psycopg's connections can’t be shared across processes, I create a connection for each row, then close it.

Overall, multiprocessing is faster than single processing (5 times faster with 8 CPUs). However, creating a connection is slow: I'd like to create just a few connections and keep them open as long as required.

Since .map() chops ids_list into a number of chunks which it submits to the process pool, would it be possible to share a database connection for all ids in the same chunk/process?

Sample code:

from multiprocessing import Pool
import psycopg2


def create_db_connection():
    conn = psycopg2.connect(database=database,
                            user=user,
                            password=password,
                            host=host)
    return conn


def my_function(item_id):
    conn = create_db_connection()

    # Other CPU-intensive operations are done here

    cur = conn.cursor()
    cur.execute("""
        UPDATE table
        SET
        my_column = 1
        WHERE id = %s;
        """,
        (item_id, ))
    cur.close()
    conn.commit()


if __name__ == '__main__':
    ids_list = []  # Long list of ids

    pool = Pool()  # os.cpu_count() processes
    pool.map(my_function, ids_list)

Thanks for any help you can provide.

Upvotes: 6

Views: 2742

Answers (1)

Sidias-Korrado
Sidias-Korrado

Reputation: 403

You can use the initializer parameter of the Pool constructor. Setup the DB connection in the initializer function. Maybe pass the connection credentials as parameters.

Have a look at the docs: https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool

Upvotes: 1

Related Questions