Meir
Meir

Reputation: 2050

How does Python inheritance detection work?

I have a base class, and several sub classes that inherit from it. I am trying to detect dynamically which sub classes inherit from the base class dynamically. I am currently doing it by dynamically importing all the sub classes in the base class __init__(), and then using the __subclasses__() method.

I have the following file structure:

proj/
|-- __init__.py
|-- base.py
`-- sub
    |-- __init__.py
    |-- sub1.py
    |-- sub2.py
    `-- sub3.py

base.py:

import importlib

class Base(object):
    def __init__(self):
        importlib.import_module('sub.sub1')
        importlib.import_module('sub.sub2')
        importlib.import_module('sub.sub3')

    @classmethod
    def inheritors(cls):
        print(cls.__subclasses__())

b = Base()

b.inheritors()

sub1.py:

import sys
import os

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from base import Base

class Sub1(Base):
    pass

sub2.py:

import sys
import os

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from base import Base

class Sub2(Base):
    pass

and finally sub3.py:

import sys
import os

class Sub3(object):
    pass

You will notice that sub.sub1.Sub1 and sub.sub2.Sub2 both inherit from base.Base while sub.sub3.Sub3 does not.

When I open IPython3, and run import base I get the following output:

In [1]: import base
[<class 'sub.sub1.Sub1'>, <class 'sub.sub2.Sub2'>]

The output above is exactly as I would expect it to be. It gets weird when I run base.py using Python command line:

python3 base.py
[<class 'sub.sub2.Sub2'>]
[]

Now I think that I understand that there are two prints in the second case because the Python importer initially does not see base.py in the sys.modules global variable, so when a subclass is imported it will import base.py again and the code will be executed a second time. This explanation does not explain why the first time it prints [<class 'sub.sub2.Sub2'>] and not [<class 'sub.sub1.Sub1'>] as sub.sub1.Sub1 is imported first, and it does not explain why only sub.sub2.Sub2 appears in the __subclasses__() while sub.sub1.Sub1 does not.

Any explanation that would help me understand how Python works in this regard will be greatly appreciated!

EDIT: I would like to run the module using python base.py, so maybe I can be pointed in the correct direction for that?

Upvotes: 1

Views: 264

Answers (1)

jsbueno
jsbueno

Reputation: 110248

You made a knot. A complicated, uneeded knot. I could figure it out - but I don't know if I can keep it in mind to explain what is going on in a clear way :-)

But one thing first: this has less to do with "inheritance detection", andvall to do with the import system - which you tied in a complicated knot.

So, you get the unexpected result because when you do python base.py, the contents of base are recorded as the module named __main__ in sys.modules. Ordinarily, Python will never import the module and run the same code again: upon fiding an import statement that tries to import an existing module, it just creates a new variable poiting to the existing module. If that module did not finish the execution of its body yet, not all classes or variables will be seem on the place where there is the second import statement. Calls to importlib do no better - they just don t automate the variable biding part. When you do circular imports, change the import path, and import a module named base from another file, Python does not know this is the same base that is __main__. So, the new one gets a new fresh import, and a second entry in sys.modules,as base.

If you just print the __class__ in your inheritors method, it will be clear:

@classmethod
def inheritors(cls):
    print("At class {}. Subclasses: {}".format(__class__, cls.__subclasses__()))

Then you will see that "base.Base" has the "sub2" subclass and __main__.Base has no subclasses.

Now, let me try to put the timeline for it:

  1. base.py is imported as __main__ and runs up to the line b = Base(). At this point the __init__ method of Base will import the submodules
  2. submodule sub1 is run, changes the sys.path, and re-imports base.py as the base module.
  3. The contents of the base module are run until the __init__ method in base.Base is met; therein, it imports sub.sub1,and Python finds out this module has already been imported and is in sys.modules. Its code has not been completed, and the Sub1 base is not yet defined, though.
  4. Inside the sub1 import of base, __init__ tries to import sub.sub2. That is a new module to Python, so it is imported
  5. On the import of sub2, when import base is met, Python recognizes the module as imported already (although, again, not all the initialization code is complete)- it just brings the name alias to sub2 globals, and keeps on
  6. Sub2 is defined as subclass of base.Base
  7. sub.sub2 import finishes, and Python resumes to the __init__ method on step (4); Python imports sub.sub3 and resumes to the b.inheritors() call (from base, not from main). At this point the only subclass of base.Base is sub2 - that is printed
  8. The importing of base.py as base finishes, and Python resumes executing the bodu of sub.sub1- class Sub1 is defined as a subclass of base.Base
  9. Python resumes the __main__.base.__init__ execution, imports sub.sub2 - but it is already run, the same for sub.sub3
  10. __main__.Base.inheritors is called in __main__, and prints no sub-classes.

And that is the end of a complicated history.

What you should be doing

first: if you need to do the sys.path.append trickery, there is something wrong with your package. Let your package be proj, and point proj.__init__ to import base if you want that to be run (and dynamically import the other modules) - but stop fidling with sys.path to find things in your own package.

second: the cls.__subclasses__ call is of little use, as it will only tell you about the imediate subclasses of cls - if there is a grand-chid subclass it will go unoticed,

The most usual pattern is to have a register of subclasses of your Base - an as they are created, just add the new classes to this record. This can be done with a metaclass, in Python < 3.6, or with the __init_subclass__ method on Python 3.6 and on.

Upvotes: 3

Related Questions