user13860724
user13860724

Reputation:

How to optimise code of math operation of several lists

I am new in python and stackoverflow overall. I have lists:

x1=[345,567,234,456]
x2=[345,567,23,67]
x3=[345,675,56,67]
x4=[546,234,1234,45]

I want to implement math operation and currently use the following way which is redundant and time consuming:

a=((x1[0]*x2[0])+(x4[0]/x3[0]))/(x1[0]/x1[1])

output is 195617.6

But for larger sets of lists it is hard to do such math operations. Is there more effecient way?

Upvotes: 1

Views: 125

Answers (1)

Niko Fohr
Niko Fohr

Reputation: 33810

1. Numpy

  • Usually with numerical operations, especially if you have lots of data, using numpy arrays is good alternative to plain python lists.
  • Speed: I tested against the Günels answer (which uses map) and there were no notable performance gains with 2.5 million list lengths
  • Benefits: With numpy arrays the syntax is quite clear, and numpy has some very handy array methods (saved development time).
import numpy as np

x1 = np.array([345, 567, 234, 456])
x2 = np.array([345, 567, 23, 67])
x3 = np.array([345, 675, 56, 67])
x4 = np.array([546, 234, 1234, 45])

# Values of x1 but shifted by one
x1_shifted = np.append(x1[1:], np.nan)

# array([195617.60098299, 132678.14306878,  10530.94139194,             nan]
out = ((x1 * x2) + (x4 / x3)) / (x1 / x1_shifted)

2. Numba

  • Numba can be used to make your code run faster
  • Idea: Create your function with loops (numba loves looping) and use numpy arrays inside the function. Initialization of arrays is fast with np.empty(). Then, just decorate your function with numba.njit.
@numba.njit
def f_numba(x1_lst, x2_lst, x3_lst, x4_lst):

    out = np.empty((len(x1_lst)-1))

    zp = zip(x1_lst, x2_lst, x3_lst, x4_lst)
    for i, (x1, x2, x3, x4) in enumerate(zp):
        if i == len(out):
            break
        out[i] = ((x1*x2)+(x4/x3))/(x1/x1_lst[i+1])
        
    return out 

Speed test results

When using numpy arrays with numba, the code can run with this setup 47 times faster than the lambda+map implementation.

# Reference speed
In [1]: timeit f_map_lambda(x1, x2, x3, x4)
122 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# All inputs and outputs are forced to be lists
In [2]: timeit list(f_numba(np.array(x1), np.array(x2), np.array(x3), np.array(x4)))
101 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Inputs are forced to be lists, but output can be np.array
In [3]: timeit f_numba(np.array(x1), np.array(x2), np.array(x3), np.array(x4))
75.2 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Inputs and output are numpy arrays
In [4]: timeit f_numba(x1_arr, x2_arr, x3_arr, x4_arr)
2.55 ms ± 984 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In case you really need speed optimization, you might want to test the code with your data. The length of the data arrays for example plays a huge role.

Appendix

Setup used for speed benchmarking

import random
random.seed(42)
import numba 
import numpy as np 

length = 250000
x1 = [random.randint(20,500) for x in range(length)]
x2 = [random.randint(20,500) for x in range(length)]
x3 = [random.randint(20,500) for x in range(length)]
x4 = [random.randint(20,500) for x in range(length)]

# The numpy array counterparts
x1_arr = np.array(x1)
x2_arr = np.array(x2)
x3_arr = np.array(x3)
x4_arr = np.array(x4)

Reference method: f_map_lambda

As a reference, here is the lambda+map approach from Günels answer (casting x5 to list is omitted since it is not needed)

def f_map_lambda(x1, x2, x3, x4):
    x5 = map(lambda a,b:(a/b),x1[::1],x1[1::1])
    x6 = list(map(lambda a,b,c,d,e:(((a*b)+(d/c))/e),x1,x2,x3,x4,x5))
    return x6

Upvotes: 1

Related Questions