bukzor
bukzor

Reputation: 38462

python: functions *sometimes* maintain a reference to their module

If I execfile a module, and remove all (of my) reference to that module, it's functions continue to work as expected. That's normal.

However, if that execfile'd module imports other modules, and I remove all references to those modules, the functions defined in those modules start to see all their global values as None. This causes things to fail spectacularly, of course, and in a very supprising manner (TypeError NoneType on string constants, for example).

I'm surprised that the interpreter makes a special case here; execfile doesn't seem special enough to cause functions to behave differently wrt module references.

My question: Is there any clean way to make the execfile-function behavior recursive (or global for a limited context) with respect to modules imported by an execfile'd module?


To the curious:

The application is reliable configuration reloading under buildbot. The buildbot configuration is executable python, for better or for worse. If the executable configuration is a single file, things work fairly well. If that configuration is split into modules, any imports from the top-level file get stuck to the original version, due to the semantics of __import__ and sys.modules. My strategy is to hold the contents of sys.modules constant before and after configuration, so that each reconfig looks like an initial configuration. This almost works except for the above function-global reference issue.


Here's a repeatable demo of the issue:

import gc
import sys
from textwrap import dedent


class DisableModuleCache(object):
    """Defines a context in which the contents of sys.modules is held constant.
    i.e. Any new entries in the module cache (sys.modules) are cleared when exiting this context.
    """
    modules_before = None
    def __enter__(self):
        self.modules_before = sys.modules.keys()
    def __exit__(self, *args):
        for module in sys.modules.keys():
            if module not in self.modules_before:
                del sys.modules[module]
        gc.collect()  # force collection after removing refs, for demo purposes.


def reload_config(filename):
    """Reload configuration from a file"""
    with DisableModuleCache():
        namespace = {}
        exec open(filename) in namespace
        config = namespace['config']
        del namespace

    config()


def main():
    open('config_module.py', 'w').write(dedent('''
    GLOBAL = 'GLOBAL'
    def config():
        print 'config! (old implementation)'
        print GLOBAL
    '''))

    # if I exec that file itself, its functions maintain a reference to its modules,
    # keeping GLOBAL's refcount above zero
    reload_config('config_module.py')
    ## output:
    #config! (old implementation)
    #GLOBAL

    # If that file is once-removed from the exec, the functions no longer maintain a reference to their module.
    # The GLOBAL's refcount goes to zero, and we get a None value (feels like weakref behavior?).
    open('main.py', 'w').write(dedent('''
    from config_module import *
    '''))

    reload_config('main.py')
    ## output:
    #config! (old implementation)
    #None

    ## *desired* output:
    #config! (old implementation)
    #GLOBAL

    acceptance_test()


def acceptance_test():
    # Have to wait at least one second between edits (on ext3),
    # or else we import the old version from the .pyc file.
    from time import sleep
    sleep(1)

    open('config_module.py', 'w').write(dedent('''
    GLOBAL2 = 'GLOBAL2'
    def config():
        print 'config2! (new implementation)'
        print GLOBAL2

        ## There should be no such thing as GLOBAL. Naive reload() gets this wrong.
        try:
            print GLOBAL
        except NameError:
            print 'got the expected NameError :)'
        else:
            raise AssertionError('expected a NameError!')
    '''))

    reload_config('main.py')
    ## output:
    #config2! (new implementation)
    #None
    #got the expected NameError :)

    ## *desired* output:
    #config2! (new implementation)
    #GLOBAL2
    #got the expected NameError :)



if __name__ == '__main__':
    main()

Upvotes: 1

Views: 263

Answers (2)

shx2
shx2

Reputation: 64318

You should consider importing the configuration instead of execing it.

I use import for a similar purpose, and it works great. (specifically, importlib.import_module(mod)). Though, my configs consists mainly of primitives, not real functions.

Like you, I also have a "guard" context to restore the original contents of sys.modules after the import. Plus, I use sys.dont_write_bytecode = True (of course, you can add that to your DisableModuleCache -- set to True in __enter__ and to False in __exit__). This would ensure the config actually "runs" each time you import it.

The main difference between the two approaches, (other than the fact you don't have to rely on the state the interpreter stays in after execing (which I consider semi-unclean)), is that the config files are identified by their module-name/path (as used for importing) rather than the file name.


EDIT: A link to the implementation of this approach, as part of the Figura package.

Upvotes: 1

djmitche
djmitche

Reputation: 419

I don't think you need the 'acceptance_test' part of things here. The issue isn't actually weakrefs, it's modules' behavior on destruction. They clear out their __dict__ on delete. I vaguely remember that this is done to break ref cycles. I suspect that global references in function closures do something fancy to avoid a hash lookup on every invocation, which is why you get None and not a NameError.

Here's a much shorter sscce:

import gc
import sys
import contextlib
from textwrap import dedent


@contextlib.contextmanager
def held_modules():
    modules_before = sys.modules.keys()
    yield
    for module in sys.modules.keys():
        if module not in modules_before:
            del sys.modules[module]
    gc.collect()  # force collection after removing refs, for demo purposes.

def main():
    open('config_module.py', 'w').write(dedent('''
    GLOBAL = 'GLOBAL'
    def config():
        print 'config! (old implementation)'
        print GLOBAL
    '''))
    open('main.py', 'w').write(dedent('''
    from config_module import *
    '''))

    with held_modules():
        namespace = {}
        exec open('main.py') in namespace
        config = namespace['config']
    config()

if __name__ == '__main__':
    main()

Or, to put it another way, don't delete modules and expect their contents to continue functioning.

Upvotes: 1

Related Questions