Reputation: 103
I'm new to Python so I've decided to solve some common challenges to improve my knowledge of the language. I learned about numpy and its efficient ndarrays so I attempted the following experiment:
Consider the 2 sum problem (e.g. here) and let's solve it the naive way (it doesn't matter for the purpose of this question). Here's a solution with python's lists:
from itertools import combinations
def twosum1(n_lst):
pairs=list(combinations(n_lst,2))
solutions=[]
for pair in pairs:
if sum(pair)==7: solutions.append(pair)
return(solutions)
Then I created a version using np.arrays expecting it will drastically speed up the calculation:
from itertools import combinations
import numpy as np
def twosum2(n_lst):
pairs=np.array(list(combinations(n_lst,2)),dtype=int)
return pairs[pairs[:,1]+pairs[:,0]==7]
However, after timing the two functions, twosum2 is about 2x slower than twosum1. So I thought that the problem maybe in the dynamical selection of elements, so I've written an exact copy of twosum1 by replacing lists with ndarrays ...
def twosum3(n_lst):
pairs=np.array(list(combinations(n_lst,2)))
solutions=np.empty((0,2))
for pair in pairs:
if np.sum(pair)==7:
solutions=np.append(solutions,[pair],axis=0)
return(solutions)
... and the resulting function was 10x slower than the original!
How is this possible? What I'm I doing wrong here? Clearly, removing loops and replacing lists with ndarrays is not enough to gain speed (contrary to what I learned reading this).
Edit:
Upvotes: 0
Views: 423
Reputation: 18668
The costly operation is np.array(list(combinations(n_lst,2)),dtype=int)
because python must scan each member of the list, check if member is 'int compatible', convert it in integer and store it in the array.
To reach numpy performance, you must conceive all the algorithm in numpy. For example :
In [63]: n_lst=list(range(100))
In [64]: %timeit twosum1(n_lst)
11.2 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [65]: np.vstack(np.where(np.add.outer(n_lst,n_lst)==7)).T
Out[65]:
array([[0, 7],
[1, 6],
[2, 5],
[3, 4],
[4, 3],
[5, 2],
[6, 1],
[7, 0]], dtype=int64)
In [66]: %timeit np.vstack(np.where(np.add.outer(n_lst,n_lst)==7)).T
306 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This way you will win a 30 to 100 factor , depending of the problem.
Upvotes: 3