Reputation: 41
Currently, I am trying to spawn a process in a Python program which again creates threads that continuously update variables in the process address space. So far I came up with this code which runs, but the update of the variable seems not to be propagated to the process level. I would have expected that defining a variable in the process address space and using global in the thread (which shares the address space of the process) would allow the thread to manipulate the variable and propagate the changes to the process.
Below is a minimal example of the problem:
import multiprocessing
import threading
import time
import random
def process1():
lst = {}
url = "url"
thrd = threading.Thread(target = urlCaller, args = (url,))
print("process alive")
thrd.start()
while True:
# the process does some CPU intense calculation
print(lst)
time.sleep(2)
def urlCaller(url):
global lst
while True:
# the thread continuously pulls data from an API
# this is I/O heavy and therefore done by a thread
lst = {random.randint(1,9), random.randint(20,30)}
print(lst)
time.sleep(2)
prcss = multiprocessing.Process(target = process1)
prcss.start()
The process always prints an empty list while the thread prints, as expected, a list with two integers. I would expect that the process prints a list with two integers as well. (Note: I am using Spyder as IDE and somehow there is only printed something to the console if I run this code on Linux/Ubuntu but nothing is printed to the console if I run the exact same code in Spyder on Windows.)
I am aware that the use of global variables is not always a good solution but I think it serves the purpose well in this case.
You might wonder why I want to create a thread within a process. Basically, I need to run the same complex calculation on different data sets that constantly change. Hence, I need multiple processes (one for each data set) to optimize the utilization of my CPU and use threads within the processes to make the I/O process most efficient. The data depreciates very fast, therefore, I cannot just store it in a database or file, which would of course simplify the communication process between data producer (thread) and data consumer (process).
Upvotes: 0
Views: 163
Reputation: 41
Thanks to the previous answer, I figured out that it's best to implement a process class and define "thread-functions" within this class. Now, the threads can access a shared variable and manipulate this variable without the need of using "thread.join()" and terminating a thread.
Below is a minimal example in which 2 concurrent threads provide data for a parent process.
import multiprocessing
import threading
import time
import random
class process1(multiprocessing.Process):
lst = {}
url = "url"
def __init__(self, url):
super(process1, self).__init__()
self.url = url
def urlCallerInt(self, url):
while True:
self.lst = {random.randint(1,9), random.randint(20,30)}
time.sleep(2)
def urlCallerABC(self, url):
while True:
self.lst = {"Ab", "cD"}
time.sleep(5)
def run(self):
t1 = threading.Thread(target = self.urlCallerInt, args=(self.url,))
t2 = threading.Thread(target = self.urlCallerABC, args=(self.url,))
t1.start()
t2.start()
while True:
print(self.lst)
time.sleep(1)
p1 = process1("url")
p1.start()
Upvotes: 0
Reputation: 101959
You are defining a local variable lst
inside the function process1
, so what urlCaller
does is irrelevant, it cannot change the local variable of a different function. urlCaller
is defining a global variable but process1
can never see it because it's shadowed by the local variable you defined.
You need to remove lst = {}
from that function and find an other way to return a value or declare the variable global
there too:
def process1():
global lst
lst = {}
url = "url"
thrd = threading.Thread(target = urlCaller, args = (url,))
print("process alive")
thrd.start()
while True:
# the process does some CPU intense calculation
print(lst)
time.sleep(2)
I'd use something like concurrent.futures
instead of the threading
module directly.
Upvotes: 1