Reputation: 1295
I have a code that works with Thread in python, but I wanna switch to Process as if I have understood well that will give me a speed-up. Here there is the code with Thread:
threads.append(Thread(target=getId, args=(my_queue, read)))
threads.append(Thread(target=getLatitude, args=(my_queue, read)))
The code works putting the return in the Queue and after a join on the threads list, I can retrieve the results. Changing the code and the import statement my code now is like that:
threads.append(Process(target=getId, args=(my_queue, read)))
threads.append(Process(target=getLatitude, args=(my_queue, read)))
However it does not execute anything and the Queue is empty, with the Thread the Queue is not empty so I think it is related to Process. I have read answers in which the Process class does not work on Windows is it true, or there is a way to make it work (adding freeze_support() does not help)? In the negative case, multithreading on windows is actually executed in parallel on different cores?
ref:
Python multiprocessing example not working
Python code with multiprocessing does not work on Windows
Multiprocessing process does not join when putting complex dictionary in return queue (in which is described that fork does not exist on Windows)
EDIT: To add some details: the code with Process is actually working on centOS.
EDIT2: add a simplified version of my code with processes, code tested on centOS
import pandas as pd
from multiprocessing import Process, freeze_support
from multiprocessing import Queue
#%% Global variables
datasets = []
latitude = []
def fun(key, job):
global latitude
if(key == 'LAT'):
latitude.append(job)
def getLatitude(out_queue, skip = None):
latDict = {'LAT' : latitude}
out_queue.put(latDict)
n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
print("Number of baboon:" + str(n))
read = []
for i in range(0,n):
threads = []
my_queue = Queue()
threads.append(Process(target=getLatitude, args=(my_queue, read)))
for t in threads:
freeze_support() # try both with and without this line
t.start()
for t in threads:
t.join()
while not my_queue.empty():
try:
job = my_queue.get()
key = list(job.keys())
fun(key[0],job[key[0]])
except:
print("END")
read.append(i)
Upvotes: 1
Views: 7055
Reputation: 177471
Per the documentation, you need the following after the function definitions. When Python creates the subprocesses, they import your script so the code that runs at the global level will be run multiple times. For the code you only want to run in the main thread:
if __name__ == '__main__':
n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
# etc.
Indent the rest of code under this if
.
Upvotes: 3