Reputation: 40948

Python import search path: what happens first?

There seems to be slightly ambiguous wording in two parts of the Python docs regarding imports.

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path.

From "The Module Cache":

The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths.

Which of these is a more accurate representation of what happens internally with Python's import system? The logic below would say that they can't coexist, since sys.modules could very well contain modules that aren't builtin, and could exclude some modules that are.

Here's where my confusion stems from:

sys.modules is for caching modules that have already been imported; it's not expressly for storing a comprehensive list of built-in modules. (The closest thing to that, I think, is sys.built_in_modules, but that also doesn't include stuff that has a .__file__ attribute such as math.)

If I start up a new interpreter session, sys.modules contains most builtins, but excludes some stuff from sys.builtin_module_names: namely, gc and time, among others. Additionally, you can make imports of 3rd party packages, which will be placed into sys.modules, and at that point sys.modules is certainly no longer a dictionary containing only built-in modules. So, all of that would seem to say, "sys.modules != built in modules."

Upvotes: 3

Answers (3)

abarnert

Reputation: 366213

You're looking at two completely different sources of information, the tutorial and the language reference.

The tutorial section The Module Search Path (besides only describing the default behavior) is also describing only what happens when a module is actually imported.

If the module is already in the cache, this process doesn't happen. That's not explained here, because it's already covered in the previous section, More on Modules:

A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import statement.

...

Note For efficiency reasons, each module is only imported once per interpreter session.

It doesn't explain the mechanism by which this happens, because this is just a tutorial.

Meanwhile, in the reference docs for the import system, the module cache section explains the first thing that happen on an import statement.

Notice that it's not exactly true that Python avoids executing the module's statements if the module has already been imported, or that it's only imported once for efficiency. That's a consequence of the fact that the default loaders put the module in the sys.modules cache. And if you replace the loaders, or monkey with the cache after the fact, a module will in fact be imported and executed multiple times.

Subsequent sections—starting with the next section, Finders and loaders—similarly describe the details of how the module is found, more rigorously and in more detail than the Module Search Path section of the tutorial:

Python includes a number of default finders and importers. The first one knows how to locate built-in modules, and the second knows how to locate frozen modules. A third default finder searches an import path for modules.

So again, it's not exactly true that the interpreter first searches for a built-in module. Instead, the interpreter just searches its finders in order, and by default, the first finder is the built-in module finder. But if you change the list of finders, Python won't search for built-ins first.

In fact, if you print out sys.meta_path on a default installation of CPython 3.7, what you'll see is:

<class '_frozen_importlib.BuiltinImporter'>
<class '_frozen_importlib.FrozenImporter'>
<class '_frozen_importlib_external.PathFinder'>

(Under IPython, or if you've imported something like six that helps rename modules, or if you've imported something like requests that embeds versioned modules, you'll have a couple of extra finders.)

That BuiltinImporter is documented in the importlib library docs. (If you're wondering why it's not called BuiltinFinder, a finder that's also its own loader is called an importer.) What it actually does is look at sys.builtin_module_names and call an implementation-specific function to handle anything found there.

In CPython 3.6 (apologies for jumping back and forth between 3.6 and 3.7, but it shouldn't matter here…), the implementation-specific function it calls is _imp.create_builtin, and you can trace things from there.

But the key thing to notice is that not everything in builtin_module_names is actually "built-in" in the sense that it's pre-imported. For example, with a normal install, you'll probably see _ast there, but no sys.modules['_ast'].

So the create_builtin function (or, for a different implementation, whatever it uses to implement the BuiltinImporter) has to be able to import so/dll/pyd/dylib modules that come pre-installed with Python.

Upvotes: 1

pyeR_biz

Reputation: 1044

you need to distinguish between sys.path and sys.modules

sys.modules This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.

When I load sys.path in jupyter notebook, displays a dictionary of loaded module names mapped to a file location -

{'IPython': <module 'IPython' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\IPython\\__init__.py'>,
 'IPython.core': <module 'IPython.core' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\IPython\\core\\__init__.py'>,.....}

This is my module cache, but when I try

sys.modules['numpy']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-44b02d746fe5> in <module>()
----> 1 sys.modules['numpy']

KeyError: 'numpy'

Since numpy is not in my module cache. I will ask python to look for if it in a fixed set of directories which is defined in sys.path. A list of strings where I can add or remove paths as I see fit.

sys.path A list of strings that specifies the search path for modules. Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

If python finds the library in my set of sys.paths ; it will create a mapping for it in my sys.modules for quick access in the active environment.

import numpy
sys.modules['numpy']
#<module 'numpy' from 'C:\\Users\\User\\Anaconda3\\lib\\site-packages\\numpy\\__init__.py'>

Upvotes: 0

nosklo

Reputation: 223172

When you do import a module, the interpreter first searches the built-ins then the sys.path. But that is only if you're really importing the module. Before importing a module, there is a cache to search. If the module is already in the cache, it is not imported again.

Upvotes: 1

Python import search path: what happens first?

Answers (3)

Related Questions