Python: Is it better to have many variables or to store their values in lists?

Question

I am aware that this may seem like a vague question, but I wonder if (e.g. in Python) it is preferable to store multiple values in separate variables, or to store them in logical groups (lists, arrays...).

In my precise case I am supposed to translate a Matlab code into Python 2.7. It is a physically based model that digests 8 input variables and creates two large lists as output. I found that the original model has a huge amount of variables that are calculated on the way (>100). As a rule of thumb: if one calculation is accessed more than once, it is stored in a new variable. Demonstrative example:

x = 3
y = 5
x2 = x**2
z = x2 + exp(y)
zz = (y+x)/x2

x^2 is used two times (for the calculation of z and zz), so it is stored as x2. Is this really faster than letting python calculate x**2 two times? Also, would it be faster if I stored them in lists? Like this:

x = [3, 5]
z = x[0]**2 + exp(x[1])
zz = sum(x)/x[0]**2

The organisation of variables in lists may come at the expense of readibility of the code, but I would gladly take that if it makes my code run faster.

TheBlackCat · Accepted Answer

There is no performance advantage I can see to keeping it in a list. On the contrary, putting it in a list makes it slower:

>>>  %%timeit
...: x = 3
...: y = 5
...: x2 = x**2
...: z = x2 + exp(y)
...: zz = (y+x)/x2
...: 
337 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>>  %%timeit
...: x = [3, 5]
...: z = x[0]**2 + exp(x[1])
...: zz = sum(x)/x[0]**2
...: 
716 ns ± 4.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now part of that is because you are calculating x**2 twice in the list condition, but even fixing that issue doesn't make the list version faster:

>>>  %%timeit
...: x = [3, 5]
...: x0 = x[0]**2
...: z = x0 + exp(x[1])
...: zz = sum(x)/x0
...: 
502 ns ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If you compare about performance, another big issue is that you are defining ints then converting them to floats. In MATLAB, x = 5 makes a float, while in python it makes an integer. It is much faster to do everything with floats from the beginning, which you can do by just putting a . or .0 at the end of the number:

>>>  %%timeit
...: x = 3.0
...: y = 5.0
...: x2 = x**2.0
...: z = x2 + exp(y)
...: zz = (y+x)/x2
...: 
166 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

If you were to use numpy arrays rather than lists, it is even worse, because you start with a list of floats then have to do a conversion of both the numbers and the lists, then convert them back, all of which is slow:

>>> %%timeit
...: x = np.array([3., 5.])
...: x0 = x[0]**2.
...: z = x0 + np.exp(x[1])
...: zz = x.sum()/x0
...: 
3.22 µs ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As a general rule, avoid doing type conversions wherever possible, and avoid indexing when it doesn't help readability. If you have a bunch of values, then the conversion to numpy is useful. But for just two or three it is going to hurt speed and readability.

Python: Is it better to have many variables or to store their values in lists?

Answers (1)

Related Questions