Reputation: 38462
If I execfile a module, and remove all (of my) reference to that module, it's functions continue to work as expected. That's normal.
However, if that execfile'd module imports other modules, and I remove all references to those modules, the functions defined in those modules start to see all their global values as None. This causes things to fail spectacularly, of course, and in a very supprising manner (TypeError NoneType on string constants, for example).
I'm surprised that the interpreter makes a special case here; execfile
doesn't seem special enough to cause functions to behave differently wrt module references.
My question: Is there any clean way to make the execfile-function behavior recursive (or global for a limited context) with respect to modules imported by an execfile'd module?
To the curious:
The application is reliable configuration reloading under buildbot. The buildbot configuration is executable python, for better or for worse. If the executable configuration is a single file, things work fairly well. If that configuration is split into modules, any imports from the top-level file get stuck to the original version, due to the semantics of __import__
and sys.modules
. My strategy is to hold the contents of sys.modules constant before and after configuration, so that each reconfig looks like an initial configuration. This almost works except for the above function-global reference issue.
Here's a repeatable demo of the issue:
import gc
import sys
from textwrap import dedent
class DisableModuleCache(object):
"""Defines a context in which the contents of sys.modules is held constant.
i.e. Any new entries in the module cache (sys.modules) are cleared when exiting this context.
"""
modules_before = None
def __enter__(self):
self.modules_before = sys.modules.keys()
def __exit__(self, *args):
for module in sys.modules.keys():
if module not in self.modules_before:
del sys.modules[module]
gc.collect() # force collection after removing refs, for demo purposes.
def reload_config(filename):
"""Reload configuration from a file"""
with DisableModuleCache():
namespace = {}
exec open(filename) in namespace
config = namespace['config']
del namespace
config()
def main():
open('config_module.py', 'w').write(dedent('''
GLOBAL = 'GLOBAL'
def config():
print 'config! (old implementation)'
print GLOBAL
'''))
# if I exec that file itself, its functions maintain a reference to its modules,
# keeping GLOBAL's refcount above zero
reload_config('config_module.py')
## output:
#config! (old implementation)
#GLOBAL
# If that file is once-removed from the exec, the functions no longer maintain a reference to their module.
# The GLOBAL's refcount goes to zero, and we get a None value (feels like weakref behavior?).
open('main.py', 'w').write(dedent('''
from config_module import *
'''))
reload_config('main.py')
## output:
#config! (old implementation)
#None
## *desired* output:
#config! (old implementation)
#GLOBAL
acceptance_test()
def acceptance_test():
# Have to wait at least one second between edits (on ext3),
# or else we import the old version from the .pyc file.
from time import sleep
sleep(1)
open('config_module.py', 'w').write(dedent('''
GLOBAL2 = 'GLOBAL2'
def config():
print 'config2! (new implementation)'
print GLOBAL2
## There should be no such thing as GLOBAL. Naive reload() gets this wrong.
try:
print GLOBAL
except NameError:
print 'got the expected NameError :)'
else:
raise AssertionError('expected a NameError!')
'''))
reload_config('main.py')
## output:
#config2! (new implementation)
#None
#got the expected NameError :)
## *desired* output:
#config2! (new implementation)
#GLOBAL2
#got the expected NameError :)
if __name__ == '__main__':
main()
Upvotes: 1
Views: 263
Reputation: 64318
You should consider import
ing the configuration instead of exec
ing it.
I use import
for a similar purpose, and it works great. (specifically, importlib.import_module(mod)
). Though, my configs consists mainly of primitives, not real functions.
Like you, I also have a "guard" context to restore the original contents of sys.modules
after the import. Plus, I use sys.dont_write_bytecode = True
(of course, you can add that to your DisableModuleCache
-- set to True in __enter__
and to False in __exit__
). This would ensure the config actually "runs" each time you import it.
The main difference between the two approaches, (other than the fact you don't have to rely on the state the interpreter stays in after exec
ing (which I consider semi-unclean)), is that the config files are identified by their module-name/path (as used for importing) rather than the file name.
EDIT: A link to the implementation of this approach, as part of the Figura package.
Upvotes: 1
Reputation: 419
I don't think you need the 'acceptance_test' part of things here. The issue isn't actually weakrefs, it's modules' behavior on destruction. They clear out their __dict__
on delete. I vaguely remember that this is done to break ref cycles. I suspect that global references in function closures do something fancy to avoid a hash lookup on every invocation, which is why you get None
and not a NameError
.
Here's a much shorter sscce:
import gc
import sys
import contextlib
from textwrap import dedent
@contextlib.contextmanager
def held_modules():
modules_before = sys.modules.keys()
yield
for module in sys.modules.keys():
if module not in modules_before:
del sys.modules[module]
gc.collect() # force collection after removing refs, for demo purposes.
def main():
open('config_module.py', 'w').write(dedent('''
GLOBAL = 'GLOBAL'
def config():
print 'config! (old implementation)'
print GLOBAL
'''))
open('main.py', 'w').write(dedent('''
from config_module import *
'''))
with held_modules():
namespace = {}
exec open('main.py') in namespace
config = namespace['config']
config()
if __name__ == '__main__':
main()
Or, to put it another way, don't delete modules and expect their contents to continue functioning.
Upvotes: 1