cmq
cmq

Reputation: 41

Creating a process that creates a thread which again updates a global variable

Currently, I am trying to spawn a process in a Python program which again creates threads that continuously update variables in the process address space. So far I came up with this code which runs, but the update of the variable seems not to be propagated to the process level. I would have expected that defining a variable in the process address space and using global in the thread (which shares the address space of the process) would allow the thread to manipulate the variable and propagate the changes to the process.

Below is a minimal example of the problem:

import multiprocessing 
import threading
import time
import random

def process1():
    lst = {}
    url = "url"
    thrd = threading.Thread(target = urlCaller, args = (url,))
    print("process alive")
    thrd.start()

    while True:
        # the process does some CPU intense calculation
        print(lst)
        time.sleep(2)

def urlCaller(url):
    global lst

    while True:
        # the thread continuously pulls data from an API
        # this is I/O heavy and therefore done by a thread
        lst = {random.randint(1,9), random.randint(20,30)}
        print(lst)
        time.sleep(2)


prcss = multiprocessing.Process(target = process1)
prcss.start()

The process always prints an empty list while the thread prints, as expected, a list with two integers. I would expect that the process prints a list with two integers as well. (Note: I am using Spyder as IDE and somehow there is only printed something to the console if I run this code on Linux/Ubuntu but nothing is printed to the console if I run the exact same code in Spyder on Windows.)

I am aware that the use of global variables is not always a good solution but I think it serves the purpose well in this case.

You might wonder why I want to create a thread within a process. Basically, I need to run the same complex calculation on different data sets that constantly change. Hence, I need multiple processes (one for each data set) to optimize the utilization of my CPU and use threads within the processes to make the I/O process most efficient. The data depreciates very fast, therefore, I cannot just store it in a database or file, which would of course simplify the communication process between data producer (thread) and data consumer (process).

Upvotes: 0

Views: 163

Answers (2)

cmq
cmq

Reputation: 41

Thanks to the previous answer, I figured out that it's best to implement a process class and define "thread-functions" within this class. Now, the threads can access a shared variable and manipulate this variable without the need of using "thread.join()" and terminating a thread.

Below is a minimal example in which 2 concurrent threads provide data for a parent process.

import multiprocessing
import threading
import time
import random

class process1(multiprocessing.Process):
    lst = {}
    url = "url"

    def __init__(self, url):
        super(process1, self).__init__()
        self.url = url

    def urlCallerInt(self, url):
        while True:
            self.lst = {random.randint(1,9), random.randint(20,30)}
            time.sleep(2)

    def urlCallerABC(self, url):
        while True:
            self.lst = {"Ab", "cD"}
            time.sleep(5)

    def run(self):
        t1 = threading.Thread(target = self.urlCallerInt, args=(self.url,))
        t2 = threading.Thread(target = self.urlCallerABC, args=(self.url,))
        t1.start()
        t2.start()

        while True:
            print(self.lst)
            time.sleep(1)

p1 = process1("url")
p1.start()

Upvotes: 0

Bakuriu
Bakuriu

Reputation: 101959

You are defining a local variable lst inside the function process1, so what urlCaller does is irrelevant, it cannot change the local variable of a different function. urlCaller is defining a global variable but process1 can never see it because it's shadowed by the local variable you defined.

You need to remove lst = {} from that function and find an other way to return a value or declare the variable global there too:

def process1():
    global lst
    lst = {}
    url = "url"
    thrd = threading.Thread(target = urlCaller, args = (url,))
    print("process alive")
    thrd.start()

    while True:
        # the process does some CPU intense calculation
        print(lst)
        time.sleep(2)

I'd use something like concurrent.futures instead of the threading module directly.

Upvotes: 1

Related Questions