n00bz0r
n00bz0r

Reputation: 87

python2 and python3 multiprocessing.process issue

I'm trying to understand what changed between python2 and python3 in the multiprocessing module. On python2 running this code works like a charm:

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    threads = []
    for row in res.fetchall():
        p = multiprocessing.Process(target=RunPrice, args=(row[0],row[1]))
        threads.append(p)
        p.start()
    for proc in threads:
        proc.join()

Let's say there is 2000 entries to be processed in SELLS. On python2 this script run and exit as expected. On python3 I get a:

  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 69, in _launch
    child_r, parent_w = os.pipe()
OSError: [Errno 24] Too many open files

Any idea what happened between python2 and python3?

Upvotes: 0

Views: 189

Answers (1)

Booboo
Booboo

Reputation: 44323

I am assuming that your actual RunPrice function is a bit more CPU-intensive than what you show. Otherwise, this would not be a good candidate for multiprocessing. If RunPrice were very CPU-intensive and does not relinquish the CPU to wait for I/O to complete, it would not be advantageous to have a processing pool with more processes than the number of CPU cores that you have when you consider that creating processes is not a particularly inexpensive operation (although certainly not as expensive as it would be if you were running on Windows).

from multiprocessing import Pool

def RunPrice(items, price):
    print("There is %s items, price is: %s" % (items, price))

def GetTargetItemsAndPrice(cursor):
    res = cursor.execute("SELECT DISTINCT items, price FROM SELLS")
    rows = res.fetchall()
    MAX_POOL_SIZE = 1024
    # if RunPrice is very CPU-intensive, it may not pay to have a pool size
    # greater than the number of CPU cores you have. In that case:
    #from multiprocessing import cpu_count
    #MAX_POOL_SIZE = cpu_count()
    pool_size = min(MAX_POOL_SIZE, len(rows))
    with Pool(pool_size) as pool:
        # return values from RunPrice:
        results = pool.starmap(RunPrice, [(row[0], row[1]) for row in rows])

Upvotes: 1

Related Questions