Reputation: 2050
I have a base class, and several sub classes that inherit from it. I am trying to detect dynamically which sub classes inherit from the base class dynamically. I am currently doing it by dynamically importing all the sub classes in the base class __init__()
, and then using the __subclasses__()
method.
I have the following file structure:
proj/
|-- __init__.py
|-- base.py
`-- sub
|-- __init__.py
|-- sub1.py
|-- sub2.py
`-- sub3.py
base.py:
import importlib
class Base(object):
def __init__(self):
importlib.import_module('sub.sub1')
importlib.import_module('sub.sub2')
importlib.import_module('sub.sub3')
@classmethod
def inheritors(cls):
print(cls.__subclasses__())
b = Base()
b.inheritors()
sub1.py:
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from base import Base
class Sub1(Base):
pass
sub2.py:
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from base import Base
class Sub2(Base):
pass
and finally sub3.py:
import sys
import os
class Sub3(object):
pass
You will notice that sub.sub1.Sub1
and sub.sub2.Sub2
both inherit from base.Base
while sub.sub3.Sub3
does not.
When I open IPython3, and run import base
I get the following output:
In [1]: import base
[<class 'sub.sub1.Sub1'>, <class 'sub.sub2.Sub2'>]
The output above is exactly as I would expect it to be. It gets weird when I run base.py using Python command line:
python3 base.py
[<class 'sub.sub2.Sub2'>]
[]
Now I think that I understand that there are two prints in the second case because the Python importer initially does not see base.py
in the sys.modules
global variable, so when a subclass is imported it will import base.py
again and the code will be executed a second time. This explanation does not explain why the first time it prints [<class 'sub.sub2.Sub2'>]
and not [<class 'sub.sub1.Sub1'>]
as sub.sub1.Sub1
is imported first, and it does not explain why only sub.sub2.Sub2
appears in the __subclasses__()
while sub.sub1.Sub1
does not.
Any explanation that would help me understand how Python works in this regard will be greatly appreciated!
EDIT: I would like to run the module using python base.py
, so maybe I can be pointed in the correct direction for that?
Upvotes: 1
Views: 264
Reputation: 110248
You made a knot. A complicated, uneeded knot. I could figure it out - but I don't know if I can keep it in mind to explain what is going on in a clear way :-)
But one thing first: this has less to do with "inheritance detection", andvall to do with the import system - which you tied in a complicated knot.
So, you get the unexpected result because when you do python base.py
, the contents of base are recorded as the module named __main__
in sys.modules
.
Ordinarily, Python will never import the module and run the same code again: upon fiding an import statement that tries to import an existing module, it just creates a new variable poiting to the existing module. If that module did not finish the execution of its body yet, not all classes or variables will be seem on the place where there is the second import statement. Calls to importlib do no better - they just don t automate the variable biding part. When you do circular imports, change the import path, and import a module named base
from another file, Python does not know this is the same base
that is __main__
. So, the new one gets a new fresh import, and a second entry in sys.modules,as base
.
If you just print the __class__
in your inheritors method, it will be clear:
@classmethod
def inheritors(cls):
print("At class {}. Subclasses: {}".format(__class__, cls.__subclasses__()))
Then you will see that "base.Base" has the "sub2" subclass and __main__.Base
has no subclasses.
Now, let me try to put the timeline for it:
base.py
is imported as __main__
and runs up to the line b =
Base()
. At this point the __init__
method of Base will import the
submodules sub1
is run, changes the sys.path, and
re-imports base.py as the base
module. __init__
method in base.Base is met;
therein, it imports sub.sub1
,and Python finds out this module has
already been imported and is in sys.modules
. Its code has not been
completed, and the Sub1
base is not yet defined, though. __init__
tries to import sub.sub2
. That
is a new module to Python, so it is imported sub2
, when import base
is met, Python recognizes the module as
imported already (although, again, not all the initialization code
is complete)- it just brings the name alias to sub2 globals, and
keeps on base.Base
sub.sub2
import finishes, and Python resumes to the __init__
method on step (4); Python imports sub.sub3 and resumes to the b.inheritors()
call
(from base
, not from main
). At this point the only subclass of
base.Base
is sub2
- that is printed base.py
as base
finishes, and Python resumes executing the bodu
of sub.sub1
- class Sub1
is defined as a subclass of base.Base
__main__.base.__init__
execution, imports
sub.sub2 - but it is already run, the same for sub.sub3
__main__.Base.inheritors
is called in __main__
, and prints no
sub-classes.And that is the end of a complicated history.
first: if you need to do the sys.path.append
trickery, there is something wrong with your package. Let your package be proj
, and point proj.__init__
to import base
if you want that to be run (and dynamically import the other modules) - but stop fidling with sys.path to find things in your own package.
second:
the cls.__subclasses__
call is of little use, as it will only tell you about the imediate subclasses of cls
- if there is a grand-chid subclass it will go unoticed,
The most usual pattern is to have a register of subclasses of your Base - an as they are created, just add the new classes to this record. This can be done with a metaclass, in Python < 3.6, or with the __init_subclass__
method on Python 3.6 and on.
Upvotes: 3