wampthing2
wampthing2

Reputation: 85

Python variables not defined after if __name__ == '__main__'

I'm trying to divvy up the task of looking up historical stock price data for a list of symbols by using Pool from the multiprocessing library.

This works great until I try to use the data I get back. I have my hist_price function defined and it outputs to a list-of-dicts pcl. I can print(pcl) and it has been flawless, but if I try to print(pcl) after the if __name__=='__main__': block, it blows up saying pcl is undefined. I've tried declaring global pcl in a couple places but it doesn't make a difference.

from multiprocessing import Pool

syms = ['List', 'of', 'symbols']

def hist_price(sym):
    #... lots of code looking up data, calculations, building dicts...
    stlh = {"Sym": sym, "10D Max": pcmax, "10D Min": pcmin} #simplified
    return stlh

#global pcl
if __name__ == '__main__':
    pool = Pool(4)
    #global pcl
    pcl = pool.map(hist_price, syms)
    print(pcl) #this works
    pool.close() 
    pool.join()

print(pcl) #says pcl is undefined

#...rest of my code, dependent on pcl...

I've also tried removing the if __name__=='__main__': block but it gives me a RunTimeError telling me specifically to put it back. Is there some other way to call variables to use outside of the if block?

Upvotes: 2

Views: 2986

Answers (1)

Blckknght
Blckknght

Reputation: 104722

I think there are two parts to your issue. The first is "what's wrong with pcl in the current code?", and the second is "why do I need the if __name__ == "__main__" guard block at all?".

Lets address them in order. The problem with the pcl variable is that it is only defined in the if block, so if the module gets loaded without being run as a script (which is what sets __name__ == "__main__"), it will not be defined when the later code runs.

To fix this, you can change how your code is structured. The simplest fix would be to guard the other bits of the code that use pcl within an if __name__ == "__main__" block too (e.g. indent them all under the current block, perhaps). An alternative fix would be to put the code that uses pcl into functions (which can be declared outside the guard block), then call the functions from within an if __name__ == "__main__" block. That would look something like this:

def do_stuff_with_pcl(pcl):
    print(pcl)

if __name__ == "__main__":
    # multiprocessing code, etc
    pcl = ...
    do_stuff_with_pcl(pcl)

As for why the issue came up in the first place, the ultimate cause is using the multiprocessing module on Windows. You can read about the issue in the documentation.

When multiprocessing creates a new process for its Pool, it needs to initialize that process with a copy of the current module's state. Because Windows doesn't have fork (which copies the parent process's memory into a child process automatically), Python needs to set everything up from scratch. In each child process, it loads the module from its file, and if you the module's top-level code tries to create a new Pool, you'd have a recursive situation where each of the child process would start spawning a whole new set of child processes of its own.

The multiprocessing code has some guards against that, I think (so you won't fork bomb yourself out of simple carelessness), but you still need to do some of the work yourself too, by using if __name__ == "__main__" to guard any code that shouldn't be run in the child processes.

Upvotes: 2

Related Questions