Guido Muscioni
Guido Muscioni

Reputation: 1295

Multiprocessing with Python and Windows

I have a code that works with Thread in python, but I wanna switch to Process as if I have understood well that will give me a speed-up. Here there is the code with Thread:

threads.append(Thread(target=getId, args=(my_queue, read)))
threads.append(Thread(target=getLatitude, args=(my_queue, read)))

The code works putting the return in the Queue and after a join on the threads list, I can retrieve the results. Changing the code and the import statement my code now is like that:

threads.append(Process(target=getId, args=(my_queue, read)))
threads.append(Process(target=getLatitude, args=(my_queue, read)))

However it does not execute anything and the Queue is empty, with the Thread the Queue is not empty so I think it is related to Process. I have read answers in which the Process class does not work on Windows is it true, or there is a way to make it work (adding freeze_support() does not help)? In the negative case, multithreading on windows is actually executed in parallel on different cores?

ref:

Python multiprocessing example not working

Python code with multiprocessing does not work on Windows

Multiprocessing process does not join when putting complex dictionary in return queue (in which is described that fork does not exist on Windows)

EDIT: To add some details: the code with Process is actually working on centOS.

EDIT2: add a simplified version of my code with processes, code tested on centOS

import pandas as pd
from multiprocessing import Process, freeze_support
from multiprocessing import Queue

#%% Global variables

datasets = []

latitude = []

def fun(key, job):
    global latitude
    if(key == 'LAT'):
        latitude.append(job)

def getLatitude(out_queue, skip = None):
    latDict = {'LAT' : latitude}
    out_queue.put(latDict)

n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
print("Number of baboon:" + str(n))

read = []

for i in range(0,n):
    threads = []
    my_queue = Queue()
    threads.append(Process(target=getLatitude, args=(my_queue, read)))

    for t in threads:
        freeze_support() # try both with and without this line
        t.start()

    for t in threads:
        t.join()

    while not my_queue.empty():
        try:
            job = my_queue.get()
            key = list(job.keys())
            fun(key[0],job[key[0]])
        except:
            print("END")  

    read.append(i)    

Upvotes: 1

Views: 7055

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177471

Per the documentation, you need the following after the function definitions. When Python creates the subprocesses, they import your script so the code that runs at the global level will be run multiple times. For the code you only want to run in the main thread:

if __name__ == '__main__':
    n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
    # etc.

Indent the rest of code under this if.

Upvotes: 3

Related Questions