Reputation: 6290
I have a config file config.py which holds a global variable, i.e. in config.py I have (5 is the default)
# config.py
globalVar = 5
Now in a module run.py I'm setting the global variable and then I call a printing function:
# run.py
import config
import test
config.globalVar = 7
test.do_printing()
# test.py
import config
def do_printing():
print(config.globalVar)
This works well (i.e. 7 is printed) but if I use multiple threads for printing (in test.py) it does not work anymore, i.e. then the threads do not see the change made by run.py (i.e. 5 is printed).
How can this be solved?
Upvotes: 4
Views: 6324
Reputation: 110696
Even when running on the same thread you might have issues doing that. For example, if you do from config import globalVar
instead, if you rebind globalVar in the local module, it just looses the reference to the object in the config module.
And even if you don't do that, if changes to the variable take place at import time of your various modules, it is very hard to keep track of the actual import order.
When you add threads, that just becomes 100% unmanageable, due to all sorts of race conditions. Other than a race condition (i.e. one of your threads reads the variable before it has been set on the other thread), or incorrect importing, threads should not affect the visibility of global variable changes in the way you describe.
The solution for having deterministic code is to use data structures that are appropriate for that interchange across threads (and data protection across threads).
The threading
module itself offers the Event
object that you can use for one thread to wait for sure until the other changes the value you are expecting:
config.py:
changed = Event()
changed.clear()
global_var = 5
module in worker thread:
import config
def do_things():
while True:
config.changed.wait() # blocks until other thread sets the event
do_more_things_with(config.global_var)
and on the main thread:
import config
config.global_var = 7
config.changed.set() # FRees the waiting Thread to run
Note in the above code, I always refer to the objects in config with the dotted notation. That makes no difference for the "event" object - I could do from config import changed
- since I am dealing with internal states of the same object, it would work - but if I do from config import global_var
and reassign it with global_var = 7
, that only changes where the local_var
name in the current module's context points. The config.local_var
still references the original value.
And since you are at it, it is worth taking a look on the queue module, as well as on thread-local objects
Another possibility for not seeing the changes is that, since the parallelism is not in your code, but in another library, it is spawning Processes with th e multiprocessing
module instead of threads.
The problems you have if you were expecting Threads and having multiprocessing-spawned processes would be exactly what you describe: of changes to global variables not being visible in others (simply because each process has its own variables, of course).
If that is the case, it is possible to have (numeric, typed), objects that are synchronized across the processes. Check the Array
and Value
classes, and multiprocessing Queue
to be able to send and receive (mostly) arbitrary objects.
(Add a import multiprocessing; print(multiprocessing.current_process())
line to your code to be sure. Independent of the result, please suggest the maintainers of RandomizedSearchCV documentation to mention explicitly what they are doing for parallelism)
Upvotes: 4