Reputation: 85
I'm trying to divvy up the task of looking up historical stock price data for a list of symbols by using Pool
from the multiprocessing
library.
This works great until I try to use the data I get back. I have my hist_price
function defined and it outputs to a list-of-dicts pcl
. I can print(pcl)
and it has been flawless, but if I try to print(pcl)
after the if __name__=='__main__':
block, it blows up saying pcl
is undefined. I've tried declaring global pcl
in a couple places but it doesn't make a difference.
from multiprocessing import Pool
syms = ['List', 'of', 'symbols']
def hist_price(sym):
#... lots of code looking up data, calculations, building dicts...
stlh = {"Sym": sym, "10D Max": pcmax, "10D Min": pcmin} #simplified
return stlh
#global pcl
if __name__ == '__main__':
pool = Pool(4)
#global pcl
pcl = pool.map(hist_price, syms)
print(pcl) #this works
pool.close()
pool.join()
print(pcl) #says pcl is undefined
#...rest of my code, dependent on pcl...
I've also tried removing the if __name__=='__main__':
block but it gives me a RunTimeError telling me specifically to put it back. Is there some other way to call variables to use outside of the if
block?
Upvotes: 2
Views: 2986
Reputation: 104722
I think there are two parts to your issue. The first is "what's wrong with pcl
in the current code?", and the second is "why do I need the if __name__ == "__main__"
guard block at all?".
Lets address them in order. The problem with the pcl
variable is that it is only defined in the if
block, so if the module gets loaded without being run as a script (which is what sets __name__ == "__main__"
), it will not be defined when the later code runs.
To fix this, you can change how your code is structured. The simplest fix would be to guard the other bits of the code that use pcl
within an if __name__ == "__main__"
block too (e.g. indent them all under the current block, perhaps). An alternative fix would be to put the code that uses pcl
into functions (which can be declared outside the guard block), then call the functions from within an if __name__ == "__main__"
block. That would look something like this:
def do_stuff_with_pcl(pcl):
print(pcl)
if __name__ == "__main__":
# multiprocessing code, etc
pcl = ...
do_stuff_with_pcl(pcl)
As for why the issue came up in the first place, the ultimate cause is using the multiprocessing
module on Windows. You can read about the issue in the documentation.
When multiprocessing creates a new process for its Pool
, it needs to initialize that process with a copy of the current module's state. Because Windows doesn't have fork
(which copies the parent process's memory into a child process automatically), Python needs to set everything up from scratch. In each child process, it loads the module from its file, and if you the module's top-level code tries to create a new Pool
, you'd have a recursive situation where each of the child process would start spawning a whole new set of child processes of its own.
The multiprocessing
code has some guards against that, I think (so you won't fork bomb yourself out of simple carelessness), but you still need to do some of the work yourself too, by using if __name__ == "__main__"
to guard any code that shouldn't be run in the child processes.
Upvotes: 2