femibyte
femibyte

Reputation: 3497

multiprocessing module and distinct psycopg2 connections

I am very puzzled as to the behavior of some multiprocessing code that is using psycopg2 to make queries in parallel to a postgres db.

Essentially, I am making the same query (with different params) to various partitions of a larger table. I am using multiprocessing.Pool to fork off a separate query.

My multiprocessing call looks like this:

pool = Pool(processes=num_procs)
results=pool.map(run_sql, params_list)

My run_sql code looks like this:

def run_sql(zip2):
    conn = get_connection()
    curs = conn.cursor()
    print "conn: %s curs:%s pid=%s" % (id(conn), id(curs), os.getpid())
    ...
    curs.execute(qry)
    records = curs.fetchall()

def get_connection()
    ...
    conn = psycopg2.connect(user=db_user, host=db_host, 
                         dbname=db_name, password=db_pwd)

    return conn

So my expectation is that each process would get a separate db connection via the call to get_connection() and that print id(conn) would display a distinct value. However, that doesn't seem to be the case and I am at a loss to explain it. Even print id(curs) is the same. Only print os.getpid() shows a difference. Does it somehow use the same connection for each forked process ?

conn: 4614554592 curs:4605160432 pid=46802
conn: 4614554592 curs:4605160432 pid=46808
conn: 4614554592 curs:4605160432 pid=46810
conn: 4614554592 curs:4605160432 pid=46784
conn: 4614554592 curs:4605160432 pid=46811

Upvotes: 8

Views: 4194

Answers (1)

femibyte
femibyte

Reputation: 3497

I think I've figured this out. The answer lies in the fact that multiprocessing in Python is shared-nothing so the entire memory space is copied, functions and all. Hence for each process, even though the pid is different, the memory spaces are copies of each other and the address of the connection within the memory space ends up being the same. The same reason is why declaring a global connection pool as I did initially was useless, each process ended up with its own connection pool with just 1 connection active at a time.

Upvotes: 8

Related Questions