Reputation: 40948
There seems to be slightly ambiguous wording in two parts of the Python docs regarding imports.
From "The Module Search Path":
When a module named
spam
is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file namedspam.py
in a list of directories given by the variablesys.path
.
From "The Module Cache":
The first place checked during import search is
sys.modules
. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths.
Which of these is a more accurate representation of what happens internally with Python's import system? The logic below would say that they can't coexist, since sys.modules
could very well contain modules that aren't builtin, and could exclude some modules that are.
Here's where my confusion stems from:
sys.modules
is for caching modules that have already been imported; it's not expressly for storing a comprehensive list of built-in modules. (The closest thing to that, I think, is sys.built_in_modules
, but that also doesn't include stuff that has a .__file__
attribute such as math
.)
If I start up a new interpreter session, sys.modules
contains most builtins, but excludes some stuff from sys.builtin_module_names
: namely, gc
and time
, among others. Additionally, you can make imports of 3rd party packages, which will be placed into sys.modules
, and at that point sys.modules
is certainly no longer a dictionary containing only built-in modules. So, all of that would seem to say, "sys.modules
!= built in modules."
Upvotes: 3
Views: 2782
Reputation: 366213
You're looking at two completely different sources of information, the tutorial and the language reference.
The tutorial section The Module Search Path (besides only describing the default behavior) is also describing only what happens when a module is actually imported.
If the module is already in the cache, this process doesn't happen. That's not explained here, because it's already covered in the previous section, More on Modules:
A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement.
...
Note For efficiency reasons, each module is only imported once per interpreter session.
It doesn't explain the mechanism by which this happens, because this is just a tutorial.
Meanwhile, in the reference docs for the import system, the module cache section explains the first thing that happen on an import
statement.
Notice that it's not exactly true that Python avoids executing the module's statements if the module has already been imported, or that it's only imported once for efficiency. That's a consequence of the fact that the default loaders put the module in the sys.modules
cache. And if you replace the loaders, or monkey with the cache after the fact, a module will in fact be imported and executed multiple times.
Subsequent sections—starting with the next section, Finders and loaders—similarly describe the details of how the module is found, more rigorously and in more detail than the Module Search Path section of the tutorial:
Python includes a number of default finders and importers. The first one knows how to locate built-in modules, and the second knows how to locate frozen modules. A third default finder searches an import path for modules.
So again, it's not exactly true that the interpreter first searches for a built-in module. Instead, the interpreter just searches its finders in order, and by default, the first finder is the built-in module finder. But if you change the list of finders, Python won't search for built-ins first.
In fact, if you print out sys.meta_path
on a default installation of CPython 3.7, what you'll see is:
<class '_frozen_importlib.BuiltinImporter'>
<class '_frozen_importlib.FrozenImporter'>
<class '_frozen_importlib_external.PathFinder'>
(Under IPython, or if you've imported something like six
that helps rename modules, or if you've imported something like requests
that embeds versioned modules, you'll have a couple of extra finders.)
That BuiltinImporter
is documented in the importlib
library docs. (If you're wondering why it's not called BuiltinFinder
, a finder that's also its own loader is called an importer.) What it actually does is look at sys.builtin_module_names
and call an implementation-specific function to handle anything found there.
In CPython 3.6 (apologies for jumping back and forth between 3.6 and 3.7, but it shouldn't matter here…), the implementation-specific function it calls is _imp.create_builtin
, and you can trace things from there.
But the key thing to notice is that not everything in builtin_module_names
is actually "built-in" in the sense that it's pre-imported. For example, with a normal install, you'll probably see _ast
there, but no sys.modules['_ast']
.
So the create_builtin
function (or, for a different implementation, whatever it uses to implement the BuiltinImporter
) has to be able to import so/dll/pyd/dylib modules that come pre-installed with Python.
Upvotes: 1
Reputation: 1044
you need to distinguish between sys.path
and sys.modules
sys.modules This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.
When I load sys.path
in jupyter notebook, displays a dictionary of loaded module names mapped to a file location -
{'IPython': <module 'IPython' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\IPython\\__init__.py'>,
'IPython.core': <module 'IPython.core' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\IPython\\core\\__init__.py'>,.....}
This is my module cache, but when I try
sys.modules['numpy']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-44b02d746fe5> in <module>()
----> 1 sys.modules['numpy']
KeyError: 'numpy'
Since numpy is not in my module cache. I will ask python to look for if it in a fixed set of directories which is defined in sys.path
. A list of strings where I can add or remove paths as I see fit.
sys.path A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.
If python finds the library in my set of sys.path
s ; it will create a mapping for it in my sys.modules
for quick access in the active environment.
import numpy
sys.modules['numpy']
#<module 'numpy' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\numpy\\__init__.py'>
Upvotes: 0
Reputation: 223172
When you do import a module, the interpreter first searches the built-ins then the sys.path
. But that is only if you're really importing the module. Before importing a module, there is a cache to search. If the module is already in the cache, it is not imported again.
Upvotes: 1