Reputation: 2801
I am aware that this may seem like a vague question, but I wonder if (e.g. in Python) it is preferable to store multiple values in separate variables, or to store them in logical groups (lists, arrays...).
In my precise case I am supposed to translate a Matlab code into Python 2.7. It is a physically based model that digests 8 input variables and creates two large lists as output. I found that the original model has a huge amount of variables that are calculated on the way (>100). As a rule of thumb: if one calculation is accessed more than once, it is stored in a new variable. Demonstrative example:
x = 3
y = 5
x2 = x**2
z = x2 + exp(y)
zz = (y+x)/x2
x^2 is used two times (for the calculation of z and zz), so it is stored as x2. Is this really faster than letting python calculate x**2 two times? Also, would it be faster if I stored them in lists? Like this:
x = [3, 5]
z = x[0]**2 + exp(x[1])
zz = sum(x)/x[0]**2
The organisation of variables in lists may come at the expense of readibility of the code, but I would gladly take that if it makes my code run faster.
Upvotes: 0
Views: 1072
Reputation: 10298
There is no performance advantage I can see to keeping it in a list. On the contrary, putting it in a list makes it slower:
>>> %%timeit
...: x = 3
...: y = 5
...: x2 = x**2
...: z = x2 + exp(y)
...: zz = (y+x)/x2
...:
337 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> %%timeit
...: x = [3, 5]
...: z = x[0]**2 + exp(x[1])
...: zz = sum(x)/x[0]**2
...:
716 ns ± 4.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Now part of that is because you are calculating x**2
twice in the list condition, but even fixing that issue doesn't make the list version faster:
>>> %%timeit
...: x = [3, 5]
...: x0 = x[0]**2
...: z = x0 + exp(x[1])
...: zz = sum(x)/x0
...:
502 ns ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you compare about performance, another big issue is that you are defining ints then converting them to floats. In MATLAB, x = 5
makes a float, while in python it makes an integer. It is much faster to do everything with floats from the beginning, which you can do by just putting a .
or .0
at the end of the number:
>>> %%timeit
...: x = 3.0
...: y = 5.0
...: x2 = x**2.0
...: z = x2 + exp(y)
...: zz = (y+x)/x2
...:
166 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If you were to use numpy arrays rather than lists, it is even worse, because you start with a list of floats then have to do a conversion of both the numbers and the lists, then convert them back, all of which is slow:
>>> %%timeit
...: x = np.array([3., 5.])
...: x0 = x[0]**2.
...: z = x0 + np.exp(x[1])
...: zz = x.sum()/x0
...:
3.22 µs ± 8.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
As a general rule, avoid doing type conversions wherever possible, and avoid indexing when it doesn't help readability. If you have a bunch of values, then the conversion to numpy is useful. But for just two or three it is going to hurt speed and readability.
Upvotes: 1