Idan Arye
Idan Arye

Reputation: 12603

Creating a pseudo-module that creates submodules at runtime

To support extensions in my Python project, I'm trying to create a pseudo-module that will serve "extension modules" as it's submodules. I'm having a problem treating the submodules as modules - it seems like I need to access them using from..import on the main pseudo-module and can't just access their full path.

Here is a minimal working example:

import sys
from types import ModuleType


class Foo(ModuleType):
    @property
    def bar(self):
        # Here I would actually find the location of `bar.py` and load it
        bar = ModuleType('foo.bar')
        sys.modules['foo.bar'] = bar
        return bar


sys.modules['foo'] = Foo('foo')

from foo import bar  # without this line the next line fails
import foo.bar

This works, but if I comment out the from foo import bar line, it'll fail with:

ImportError: No module named bar

on Python2, and on Python3 it'll fail with:

ModuleNotFoundError: No module named 'foo.bar'; 'foo' is not a package

If I add the fields to make it a package:

class Foo(ModuleType):
    __all__ = ('bar',)
    __package__ = 'foo'
    __path__ = []
    __file__ = __file__

It'll fail on:

ModuleNotFoundError: No module named 'foo.bar'

From what I understand, the problem is that I did not set sys.modules['foo.bar'] yet. But... to fill sys.modules I need to load the module first, and I don't want to do it unless the user of my project explicitly imports it.

Is there any way to make Python realize that when it sees import foo.bar it needs to load foo first(or I can just guarantee foo will already be loaded at that point) and take bar from it?

Upvotes: 1

Views: 632

Answers (2)

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

This post does NOT answer "This is how you do it." If you want to know how to do this yourself look at PEP 302 or Idan Arye's solution. This post instead presents a recipe that makes it easy to write. The recipe is at the end of this answer.


The block of code below defines two classes intended for use: PseudoModule and PseudoPackage. The behaviour only differs from whether import foo.x should raise an error stating foo isn't a package or try to load x and make sure it's a module. Several example uses are outlined below.

PseudoModule

PseudoModule can be used as a decorator to a function, it creates a new module object that when attributes are accessed for the first time it called the decorated function with the name of the attribute and the namespace of previously defined elements.

For example, this will make a module that assigns a new integer to each attribute accessed:

@PseudoModule
def access_tracker(attr, namespace):
    namespace["_count"] = namespace.get("_count", -1) + 1
    return namespace["_count"]

#PseudoModule will set `namespace[attr] = <return value>` for you
#this can be overriden by passing `remember_results=False` to the constructor

sys.modules["access_tracker"] = access_tracker

from access_tracker import zero, one, two, three

assert zero == 0 and one == 1 and two == 2 and three == 3

PseudoPackage

PseudoPackage is used the same way as PseudoModule however if the decorated function returns a module (or package) it will correct the name to be qualified as a subpackage and sys.modules is updated as needed. (the top level package still needs to be added to sys.modules manually)

Here is an example use of PseudoPackage:

spam_submodules = {"bacon"}
spam_attributes = {"eggs", "ham"}

@PseudoPackage
def spam(name, namespace):
    print("getting a component of spam:", name)
    if name in spam_submodules:
        @PseudoModule
        def submodule(attr, nested_namespace):
            print("getting a component of submodule {}: {}".format(name, attr))
            return attr #use the string of the attribute
        return submodule #PseudoPackage will rename the module to be spam.bacon for us
    elif name in spam_attributes:
        return "supported attribute"
    else:
        raise AttributeError("spam doesn't have any {!r}.".format(name))

sys.modules["spam"] = spam

import spam.bacon
#prints "getting a component of spam: bacon"

assert spam.bacon.something == "something"
#prints "getting a component of submodule bacon: something"

from spam import eggs
#prints "getting a component of spam: eggs"
assert eggs == "supported attribute"

import spam.ham #ham isn't a submodule, raises error!

The way PseudoPackage is setup also makes arbitrary depth packages very easy although this specific example doesn't accomplish much:

def make_abstract_package(qualname = ""):
    "makes a PseudoPackage that has arbitrary nesting of subpackages"
    def gen_func(attr, namespace):
        print("getting {!r} from package {!r}".format(attr, qualname))
        return make_abstract_package("{}.{}".format(qualname, attr))
    #can pass the name of the module as second argument if needed
    return PseudoPackage(gen_func, qualname) 

sys.modules["foo"] = make_abstract_package("foo")

from foo.bar.baz import thing_I_want
##prints:
# getting 'bar' from package 'foo'
# getting 'baz' from package 'foo.bar'
# getting 'thing_I_want' from package 'foo.bar.baz'
print(thing_I_want)
#prints "<module 'foo.bar.baz.thing_I_want' from '<PseudoPackage>'>"

Few notes on implementation

As general guidelines:

  • The function that computes attributes of the module should not import the module it's defining the attributes for
  • If you want a package or module to be available for import, you need to put it in sys.modules yourself.
  • PseudoPackage assumes each submodule is unique, don't reuse module objects.

It is also worth noting that sys.modules is only updated with submodules of PseudoPackages when an import statement that requires the name to be a module, for example if foo is a package already in sys.modules but foo.x has not been referenced yet then all these assertions will pass:

assert "foo.x" not in sys.modules and not hasattr(foo,"x")
import foo; foo.x #foo.x is computed but not added to sys.modules
assert "foo.x" not in sys.modules and hasattr(foo,"x")
from foo import x #x is retrieved from namespace but sys.modules is still not affected
assert "foo.x" not in sys.modules

import foo.x #if x is a module then "foo.x" is added to sys.modules
assert "foo.x" in sys.modules

as well in the above case if foo.x isn't a module then the statement import foo.x raises a ModuleNotFoundError.

Finally, while the problematic edge cases I have identified can be avoided by following the guidelines above, the docstring for _PseudoPackageLoader describes the implementation details responsible for unwanted behaviour for possible future modifications.


The recipe

import sys
from types import ModuleType
import importlib.abc #uses Loader and MetaPathFinder, more for inspection purposes then use

class RawPseudoModule(ModuleType):
    """
    see PseudoModule for documentation, this class is not intended for direct use.

    RawPseudoModule does not handle __path__ so the generating function of direct
    instances are expected to make and return an appropriate value for __path__

    *** if you do not know what an appropriate value for __path__ is
        then use PseudoModule instead ***
    """
    #using slots keeps these two variables out of the module dictionary
    __slots__ = ["__generating_func", "__remember_results"]
    def __init__(self, func, name=None, remember_results = True):
        name = name or func.__name__
        super(RawPseudoModule, self).__init__(name)
        self.__file__ = "<{0.__class__.__name__}>".format(self)
        self.__generating_func = func
        self.__remember_results = remember_results

    def __getattr__(self, attr):
        value = self.__generating_func(attr, vars(self))
        if self.__remember_results:
            setattr(self, attr, value)
        return value


class PseudoModule(RawPseudoModule):
    """
    A module that has attributes generated from a specified function

    The generating function passed to the constructor should have the signature:
       f(attr:str, namespace:dict) -> object:
          - attr is the name of the attribute accessed
          - namespace is the currently defined values in the module

    the function should return a value for the attribute or raise an AttributeError if it doesn't exist.

    by default the result is then saved to the namespace so you don't
    have to explicitly do "namespace[attr] = <value>" however this behaviour
    can be overridden by specifying "remember_results = False" in the constructor.

    If no name is specified in the constructor the function name will be
    used for the module name instead, this allows the class to be used as a decorator

    Note: the PseudoModule class is setup so that "import foo.bar"
          when foo is a PseudoModule will fail stating "'foo' is not a package".
         -  to allow importing submodules use PseudoPackage.
         -  to handle the internal __path__ manually use RawPseudoPackage.

    Note: the module is NOT added to sys.modules automatically.
    """
    def __getattr__(self, attr):
        #to not have submodules then __path__ must not exist
        if attr == "__path__":
            msg = "{0.__name__} is a PseudoModule, it is not a package so it doesn't have a __path__"
            #this error message would only be seen by people who explicitly access __path__
            raise AttributeError(msg.format(self))
        return super(PseudoModule, self).__getattr__(attr)

class PseudoPackage(RawPseudoModule):
    """
    A version of PseudoModule that sets itself up to allow importing subpackages

    When a submodule is imported from a PseudoPackage:
    - it is evaluated with the generating function.
    - the name of the submodule is overriden to be correctly qualified
    - and it is added to sys.modules to allow repeated imports.

    Note: the top level package still needs to be added to sys.modules manually

    Note: A RecursionError will be raised if the code that generates submodules
          attempts to import another submodule from the PseudoPackage.
    """
    #IMPLEMENTATION DETAIL: technically this doesn't deal with adding submodules to
    #  sys.modules, that is handled in _PseudoPackageLoader
    #  which explicitly checks for instances of PseudoPackage

    __path__ = [] #packages must have a __path__ to be recognized as packages.

    def __getattr__(self, attr):
        value = super(PseudoPackage, self).__getattr__(attr)
        if isinstance(value, ModuleType):
            #I'm just going to say if it's a module then the name must be in this format.
            value.__name__ = self.__name__ + "." + attr
        return value


class _PseudoPackageLoader(importlib.abc.Loader, importlib.abc.MetaPathFinder):
    """
    Singleton finder and loader for pseudo packages

    When ever a subpackage of a PseudoPackage (that is already in sys.modules) is imported
    this will handle loading it and adding the subpackage to sys.modules

    Note that although PEP 302 states the finder should not depend on the parent
    being loaded in sys.modules, this is implemented under the understanding that 
    the user of PseudoPackage will add their module to sys.modules manually themselves
    so this will work only when the parent is present in sys.modules

    Also PEP 302 indicates the module should be added to sys.modules first in case 
    it is imported during it's execution, however this is impossible due to the
    nature of how the module actually gets loaded.
    So for heaven's sake don't try to import a pseudo package or a module that uses
    a pseudo package from within the code that generates it.

    I have only tested this when the sub module is either PseudoModule or PseudoPackage
    and it was created new from the generating function, ideally there would be a way
    to allow the generating function to return an unexecuted module and this would
    properly handle executing it but I don't know how to deal with that.
    """
    def find_module(self, fullname, path):
        #this will only support loading if the parent package is a PseudoPackage
        base,_,_ = fullname.rpartition(".")
        if isinstance(sys.modules.get(base), PseudoPackage):
            return self
        #I found that `if path is PseudoPackage.__path__` worked the same way for all the cases I tested 
        #however since load_module will fail if the base part isn't in sys.modules
        # it seems safer to just check for that.


    def load_module(self, fullname):
        if fullname in sys.modules:
            return sys.modules[fullname]
        base,_,sub = fullname.rpartition(".")
        parent = sys.modules[base]
        try:
            submodule = getattr(parent, sub)
        except AttributeError:
            #when we just access `foo.x` it raises an AttributeError
            #but `import foo.x` should instead raise an ImportError
            raise ImportError("cannot import name {!r}".format(sub))

        if not isinstance(submodule, ModuleType):
            #match the format of error raised when the submodule isn't a module
            #example: `import sys.path` raises the same format of error.
            raise ModuleNotFoundError("No module named {}".format(fullname))
        #fill all the fields as described in PEP 302 except __name__
        submodule.__loader__ = self
        submodule.__package__ = base
        submodule.__file__ = getattr(submodule, "__file__", "<submodule of PseudoPackage>")
        #if there was a way to do this before the module was made that'd be nice
        sys.modules[fullname] = submodule
        #if we needed to execute the body of an unloaded module it'd be done here.
        return submodule

#add the loader to sys.meta_path so it will handle our pseudo packages
sys.meta_path.append(_PseudoPackageLoader())

Upvotes: 3

Idan Arye
Idan Arye

Reputation: 12603

Thanks to the link @TadhgMcDonald-Jensen provided I managed to solve it:

import sys
from types import ModuleType


class FooImporter(object):
    module = ModuleType('foo')
    module.__path__ = [module.__name__]

    def find_module(self, fullname, path):
        if fullname == self.module.__name__:
            return self
        if path == [self.module.__name__]:
            return self

    def load_module(self, fullname):
        if fullname == self.module.__name__:
            return sys.modules.setdefault(fullname, self.module)
        assert fullname.startswith(self.module.__name__ + '.')
        try:
            return sys.modules[fullname]
        except KeyError:
            submodule = ModuleType(fullname)

            name = fullname[len(self.module.__name__) + 1:]
            setattr(self.module, name, submodule)

            sys.modules[fullname] = submodule

            return submodule


sys.meta_path.append(FooImporter())

from foo import bar

@TadhgMcDonald-Jensen - please make an answer so that I can approve it.

Upvotes: 1

Related Questions