Reputation: 1015

Python import coding style

I've discovered a new pattern. Is this pattern well known or what is the opinion about it?

Basically, I have a hard time scrubbing up and down source files to figure out what module imports are available and so forth, so now, instead of

import foo
from bar.baz import quux

def myFunction():
    foo.this.that(quux)

I move all my imports into the function where they're actually used., like this:

def myFunction():
    import foo
    from bar.baz import quux

    foo.this.that(quux)

This does a few things. First, I rarely accidentally pollute my modules with the contents of other modules. I could set the __all__ variable for the module, but then I'd have to update it as the module evolves, and that doesn't help the namespace pollution for code that actually lives in the module.

Second, I rarely end up with a litany of imports at the top of my modules, half or more of which I no longer need because I've refactored it. Finally, I find this pattern MUCH easier to read, since every referenced name is right there in the function body.

Upvotes: 77

Answers (10)

MSeifert

Reputation: 152647

Both variants have their uses. However in most cases it's better to import outside of the functions, not inside of them.

Performance

It has been mentioned in several answers but in my opinion they all lack a complete discussion.

The first time a module is imported in a python interpreter it will be slow, no matter if it's in the top-level or inside a function. It's slow because Python (I'm focusing on CPython, it could be different for other Python implementations) does multiple steps:

Locates the package.
Checks if the package was already converted to bytecode (the famous __pycache__ directory or the .pyx files) and if not it converts these to bytecode.
Python loads the bytecode.
The loaded module is put in sys.modules.

Subsequent imports won't have to do all of these because Python can simply return the module from sys.modules. So subsequent imports will be much faster.

It might be that a function in your module isn't actually used very often but it depends on an import that is taking quite long. Then you could actually move the import inside the function. That will make importing your module faster (because it doesn't have to import the long-loading package immediately) however when the function is finally used it will be slow on the first call (because then the module has to be imported). That may have an impact on the perceived performance because instead of slowing down all users you only slow down those which use the function that depends on the slow-loading dependency.

However the lookup in sys.modules isn't free. It's very fast, but it's not free. So if you actually call a function that imports a package very often you will notice a slightly degraded performance:

import random
import itertools

def func_1():
    return random.random()

def func_2():
    import random
    return random.random()

def loopy(func, repeats):
    for _ in itertools.repeat(None, repeats):
        func()

%timeit loopy(func_1, 10000)
# 1.14 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit loopy(func_2, 10000)
# 2.21 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's almost two times slower.

It's very important to realize that aaronasterling "cheated" a bit in the answer. He stated that doing the import in the function actually makes the function faster. And to some extend this is true. That's because how Python looks up names:

It checks the local scope first.
It checks the surrounding scope next.
Then the next surrounding scope is checked
...
The global scope is checked.

So instead of checking the local scope and then checking the global scope it suffices to check the local scope because the name of the module is available in the local scope. That actually makes it faster! But that's a technique called "Loop-invariant code motion". It basically means that you reduce the overhead of something that is done in a loop (or repeatedly) by storing it in a variable before the loop (or the repeated calls). So instead of importing it in the function you could also simply use a variable and assign it to the global name:

import random
import itertools

def f1(repeats):
    "Repeated global lookup"
    for _ in itertools.repeat(None, repeats):
        random.random()

def f2(repeats):
    "Import once then repeated local lookup"
    import random
    for _ in itertools.repeat(None, repeats):
        random.random()

def f3(repeats):
    "Assign once then repeated local lookup"
    local_random = random
    for _ in itertools.repeat(None, repeats):
        local_random.random()

%timeit f1(10000)
# 588 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit f2(10000)
# 522 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit f3(10000)
# 527 µs ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

While you can clearly see that doing repeated lookups for the global random are slow there's virtually no difference between importing the module inside the function or assigning the global module in a variable inside the function.

This could be taken to extremes by also avoiding the function lookup inside the loop:

def f4(repeats):
    from random import random
    for _ in itertools.repeat(None, repeats):
        random()

def f5(repeats):
    r = random.random
    for _ in itertools.repeat(None, repeats):
        r()

%timeit f4(10000)
# 364 µs ± 9.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit f5(10000)
# 357 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Again much faster but there's almost no difference between the import and the variable.

Optional dependencies

Sometimes having a module-level import can actually be a problem. For example if you don't want to add another install-time dependency but the module would be really helpful for some additional functionality. Deciding if a dependency should be optional shouldn't be done lightly because it will affect the users (either if they get an unexpected ImportError or otherwise miss out the "cool features") and it makes installing the package with all features more complicated, for normal dependencies pip or conda (just to mention two package managers) work out of the box, but for optional dependencies the users have to manually install packages later-on (there are some options that make it possible to customize the requirements but then again the burden of installing it "correctly" is put on the user).

But again this could be done in both ways:

try:
    import matplotlib.pyplot as plt
except ImportError:
    pass

def function_that_requires_matplotlib():
    plt.plot()

or:

def function_that_requires_matplotlib():
    import matplotlib.pyplot as plt
    plt.plot()

This could be more customized by providing alternative implementations or customizing the exception (or message) the user sees but this is the main gist.

The top-level approach could be a bit better if one wants to provide an alternative "solution" to the optional dependency, however generally people use the in-function import. Mostly because it leads to a cleaner stacktrace and is shorter.

Circular Imports

In-Function imports can be very helpful to avoid ImportErrors due to circular imports. In lots of cases circular imports are a sign of "bad" package-structure but if there is absolutely no way to avoid a circular import the "circle" (and thus the problems) are solved by putting the imports that lead to the circle inside the functions that actually use it.

Don't repeat yourself

If you actually put all imports in the function instead of the module scope you will introduce redundancy, because it's likely that functions require the same imports. That has a few disadvantages:

You have now multiple places to check if any import has become obsolete.
In case you mispelled some import you'll only find out when you run the specific function and not on load-time. Because you have more import statements the likelihood of a mistake increases (not much) and it just becomes a tiny bit more essential to test all functions.

Additional thoughts:

I rarely end up with a litany of imports at the top of my modules, half or more of which I no longer need because I've refactored it.

Most IDEs already have a checker for unused imports, so that's probably just a few clicks to remove them. Even if you don't use an IDE you can use a static code checker script once in a while and fix it manually. Another answer mentioned pylint, but there are others (for example pyflakes).

I rarely accidentally pollute my modules with the contents of other modules

That's why you typically use __all__ and/or define your functions submodules and only import the relevant classes/functions/... in the main module, for example the __init__.py.

Also if you think you polluted the module namespace too much then you probably should consider splitting the module into submodules, however that only makes sense for dozens of imports.

One additional (very important) point to mention if you want to reduce namespace pollution is by avoiding an from module import * imports. But you may also want to avoid from module import a, b, c, d, e, ... imports that import too many names and just import the module and access the functions with module.c.

As a last resort you can always use aliases to avoid polluting the namespace with "public" imports by using: import random as _random. That will make the code harder to understand but it makes it very clear what should be publicly visible and what shouldn't. It's not something I would recommend , you should just keep the __all__ list up-to-date (which is the recommended and sensible approach).

Summary

The performance impact is visible but almost always it will be micro-optimizing, so don't let the decision where you put the imports be guided by micro-benchmarks. Except if the dependency is really slow on first import and it's only used for a small subset of the functionality. Then it can actually have a visible impact on the perceived performance of your module for most users.
Use the commonly understood tools for defining the public API, I mean the __all__ variable. It might be a bit annoying to keep it up-to-date but so is checking all functions for obsolete imports or when you add a new function to add all the relevant imports in that function. In the long run you'll probably have to do less work by updating __all__.
It really doesn't matter which one you prefer, both do work. If you're working alone you can reason about the pros and cons and do which one you think is best. However if you work in a team you probably should stick to known-patterns (which would be top-level imports with __all__) because it allows them to do what they (probably) always have done.

Upvotes: 4

aaronasterling

Reputation: 71004

The (previously) top-voted answer to this question is nicely formatted but absolutely wrong about performance. Let me demonstrate

Performance

Top Import

import random

def f():
    L = []
    for i in xrange(1000):
        L.append(random.random())


for i in xrange(1000):
    f()

$ time python import.py

real        0m0.721s
user        0m0.412s
sys         0m0.020s

Import in Function Body

def f():
    import random
    L = []
    for i in xrange(1000):
        L.append(random.random())

for i in xrange(1000):
    f()

$ time python import2.py

real        0m0.661s
user        0m0.404s
sys         0m0.008s

As you can see, it can be more efficient to import the module in the function. The reason for this is simple. It moves the reference from a global reference to a local reference. This means that, for CPython at least, the compiler will emit LOAD_FAST instructions instead of LOAD_GLOBAL instructions. These are, as the name implies, faster. The other answerer artificially inflated the performance hit of looking in sys.modules by importing on every single iteration of the loop.

As a rule, it's best to import at the top but performance is not the reason if you are accessing the module a lot of times. The reasons are that one can keep track of what a module depends on more easily and that doing so is consistent with most of the rest of the Python universe.

Upvotes: 130

Russell Bryant

Reputation: 1901

Another useful thing to note is that the from module import * syntax inside of a function has been removed in Python 3.0.

There is a brief mention of it under "Removed Syntax" here:

http://docs.python.org/3.0/whatsnew/3.0.html

Upvotes: 11

fuentesjr

Reputation: 52328

I believe this is a recommended approach in some cases/scenarios. For example in Google App Engine lazy-loading big modules is recommended since it will minimize the warm-up cost of instantiating new Python VMs/interpreters. Have a look at a Google Engineer's presentation describing this. However keep in mind this doesn't mean you should lazy-load all your modules.

Upvotes: 3

Ryan

Reputation: 15326

This does have a few disadvantages.

Testing

On the off chance you want to test your module through runtime modification, it may make it more difficult. Instead of doing

import mymodule
mymodule.othermodule = module_stub

You'll have to do

import othermodule
othermodule.foo = foo_stub

This means that you'll have to patch the othermodule globally, as opposed to just change what the reference in mymodule points to.

Dependency Tracking

This makes it non-obvious what modules your module depends on. This is especially irritating if you use many third party libraries or are re-organizing code.

I had to maintain some legacy code that used imports inline all over the place, it made the code extremely difficult to refactor or repackage.

Notes On Performance

Because of the way python caches modules, there isn't a performance hit. In fact, since the module is in the local namespace, there is a slight performance benefit to importing modules in a function.

Top Import

import random

def f():
    L = []
    for i in xrange(1000):
        L.append(random.random())

for i in xrange(10000):
    f()


$ time python test.py 

real   0m1.569s
user   0m1.560s
sys    0m0.010s

Import in Function Body

def f():
    import random
    L = []
    for i in xrange(1000):
        L.append(random.random())

for i in xrange(10000):
    f()

$ time python test2.py

real    0m1.385s
user    0m1.380s
sys     0m0.000s

Upvotes: 57

dbr

Reputation: 169563

People have explained very well why to avoid inline-imports, but not really alternative workflows to address the reasons you want them in the first place.

I have a hard time scrubbing up and down source files to figure out what module imports are available and so forth

To check for unused imports I use pylint. It does static(ish)-analysis of Python code, and one of the (many) things it checks for is unused imports. For example, the following script..

import urllib
import urllib2

urllib.urlopen("http://stackoverflow.com")

..would generate the following message:

example.py:2 [W0611] Unused import urllib2

As for checking available imports, I generally rely on TextMate's (fairly simplistic) completion - when you press Esc, it completes the current word with others in the document. If I have done import urllib, urll[Esc] will expand to urllib, if not I jump to the start of the file and add the import.

Upvotes: 3

nikow

Reputation: 21548

I would suggest that you try to avoid from foo import bar imports. I only use them inside packages, where the splitting into modules is an implementation detail and there won't be many of them anyway.

In all other places, where you import a package, just use import foo and then reference it by the full name foo.bar. This way you can always tell where a certain element comes from and don't have to maintain the list of imported elements (in reality this will always be outdated and import no longer used elements).

If foo is a really long name you can simplify it with import foo as f and then write f.bar. This is still far more convenient and explicit than maintaining all the from imports.

Upvotes: 5

RSabet

Reputation: 6160

You might want to take a look at Import statement overhead in the python wiki. In short: if the module has already been loaded (look at sys.modules) your code will run slower. If your module hasn't been loaded yet, and will foo will only get loaded when needed, which can be zero times, then the overall performance will be better.

Upvotes: 1

dF.

Reputation: 75785

A few problems with this approach:

It's not immediately obvious when opening the file which modules it depends on.
It will confuse programs that have to analyze dependencies, such as py2exe, py2app etc.
What about modules that you use in many functions? You will either end up with a lot of redundant imports or you'll have to have some at the top of the file and some inside functions.

So... the preferred way is to put all imports at the top of the file. I've found that if my imports get hard to keep track of, it usually means I have too much code that I'd be better off splitting it into two or more files.

Some situations where I have found imports inside functions to be useful:

To deal with circular dependencies (if you really really can't avoid them)
Platform specific code

Also: putting imports inside each function is actually not appreciably slower than at the top of the file. The first time each module is loaded it is put into sys.modules, and each subsequent import costs only the time to look up the module, which is fairly fast (it is not reloaded).

Upvotes: 25

sykloid

Reputation: 101206

From a performance point of view, you can see this: Should Python import statements always be at the top of a module?

In general, I only use local imports in order to break dependency cycles.

Upvotes: 2

Python import coding style

Answers (10)

Performance

Optional dependencies

Circular Imports

Don't repeat yourself

Additional thoughts:

Summary

Performance

Top Import

Import in Function Body

Testing

Dependency Tracking

Notes On Performance

Top Import

Import in Function Body

Related Questions