Batman
Batman

Reputation: 8927

Most Performant Way To Do Imports

From a performance point of view (time or memory) is it better to do:

import pandas as pd

or

from pandas import DataFrame, TimeSeries

Does the best thing to depend on how many classes I'm importing from the package?

Similarly, I've seen people do things like:

def foo(bar):
    from numpy import array

Why would I ever want to do an import inside a function or method definition? Wouldn't this mean that import is being performed every time that the function is called? Or is this just to avoid namespace collisions?

Upvotes: 4

Views: 226

Answers (3)

Schmuddi
Schmuddi

Reputation: 2086

There is an advantage of importing a module inside of a function that hasn't been mentioned yet: doing so gives you some control over when the module is loaded. In fact, even though @J.J's answer recommends importing all modules as early as possible, this control allows you to postpone loading the module.

Why would you want to do that? Well, while it doesn't improve the actual performance of your program, doing so can improve the perceived performance, and by virtue of this, the user experience:

In part, users perceive whether your app is fast or slow based on how long it takes to start up.

MSDN: Best practices for your app's startup performance

Loading every module at the beginning of your main script can take some time. For example, one of my apps uses the Qt framework, Pandas, Numpy, and Matplotlib. If all these modules are imported right at the beginning of the app, the appearance of the user interface is delayed by several seconds. Users don't like to wait, and they are likely to perceive your app as generally slow because of this wait.

But if for example Matplotlib is imported only from within those functions that are called whenever the user issues a plot command, the startup time is notably reduced. The user doesn't perceive your app to be that sluggish anymore, which may result in a better user experience.

Upvotes: 0

J.J
J.J

Reputation: 3607

Answering the more general question of when to import, imports are dependancies. It is code that may-or-may-not exist, that is required for the functioning of the program. It is therefore, a very good idea to import that code as soon as possible to prevent dumb errors from cropping up in the middle of execution.

This is particularly true as pypy becomes more popular, when the import might exist but isn't usable via pypy. Far better to fail early, than potentially hours into the execution of the code.

As for "import pandas as pd" vs "from pandas import DataFrame, TimeSeries", this question has multiple concerns (as all questions do), with some far more important than others. There's the question of namespace, there's the question of readability, and there's the question of performance. Performance, as Martjin states, should contribute to about 0.0001% of the decision. Readability should contribute about 90%. Namespace only 10%, as it can be mitigated so easily.

Personally, in my opinion, both import X as Y and form X import Y is bad practice, because explicit is better than implicit. You don't want to be on line 2000 trying to remember which package "calculate_mean" comes from because it isn't referenced anywhere else in the code. When i first started using numpy I was copy/pasting code from the internet, and couldn't figure out why i didn't/couldn't pip install np. This obviously isn't a problem if you have pre-existing knowledge that "np" is python for "numpy", but it's a stupid and pointless confusion for the 3 letters it saves. It came from numpy. Use numpy.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1122262

This is micro-optimising, and you should not worry about this.

Modules are loaded once per Python process. All code that then imports only need to bind a name to the module or objects defined in the module. That binding is extremely cheap.

Moreover, the top-level code in your module only runs once too, so the binding takes place just once. An import in a function does the binding each time the function is run, but again, this is so cheap as to be negligible.

Importing in a function makes a difference for two reasons: it won't put that name in the global namespace for the module (so no namespace pollution), and because the name is now local, using that name is slightly faster than using a global.

If you want to improve performance, focus on code that is being repeated many, many times. Importing is not it.

Upvotes: 6

Related Questions