Reputation: 195
For a project, I need to generate sample from function. I would like to be able to generate those samples as quickly as possible.
I have this example (in the final version, the function
lambda will be provided in the arguments) The goal is to generate ys
of the n points linespaced xs
between start
and stop
using the lambda function
.
def get_ys(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = [function(x, coefficients) for x in xs]
return ys
%%time
n = 1000
xs = np.random.random((n,5))
ys = np.apply_along_axis(get_ys, 1, xs)
Wall time: 616 ms
I am trying to vectorize it, and found numpy.apply_along_axis
%%time
for i in range(1000):
xs = np.random.random(5)
ys = get_ys(xs)
Wall time: 622 ms
Unfortunately it is still pretty slow :/
I am not so familiar with function vectorization, can someone guide me a little bit on how to improve the speed of the script ?
Thanks!
Edit: example of input/output:
xs = np.ones(5)
ys = get_ys(xs)
[1.0, 0.9501385041551247, 0.9058171745152355, 0.8670360110803323, 0.8337950138504155,0.8060941828254848, 0.7839335180055402, 0.7673130193905817, 0.7562326869806094, 0.7506925207756232, 0.7506925207756232, 0.7562326869806094, 0.7673130193905817, 0.7839335180055401, 0.8060941828254847, 0.8337950138504155, 0.8670360110803323, 0.9058171745152354, 0.9501385041551246, 1.0]
Upvotes: 1
Views: 90
Reputation: 231738
def get_ys(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = [function(x, coefficients) for x in xs]
return ys
You are trying to get around calling get_ys
1000 times, once for each row of xs
.
What will it take to pass xs
as a whole to get_ys
? In other words, what if coefficients
was (n,5) instead of (5,)?
xs
is (20,), and the ys
will be same (right)?
The lambda is write to expect a scalar x
and (5,) args. Can it be changed to work with a (20,) x
and (n,5) args
?
As a first step, what does function
produce if given xs
? That is instead of
ys = [function(x, coefficients) for x in xs]
ys = function(xs, coefficients)
As written your code iterates (at slow Python speeds) of the n
(1000) rows, and the 20 linspace
. So function
is called 20,000 times. That's what makes your code slow.
A sample run with your function:
In [126]: np.array(get_ys(np.arange(5)))
Out[126]:
array([-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ])
Replace the list comprehension with just one call to function
:
In [127]: def get_ys1(coefficients, num_outputs=20, start=0., stop=1.):
...: function = lambda x, args: args[0]*(x-args[1])**2 + args[2]*(x-args[3]) + args[4]
...:
...: xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
...: ys = function(xs, coefficients)
...: return ys
...:
...:
Same values:
In [128]: get_ys1(np.arange(5))
Out[128]:
array([-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ])
Comparative timings:
In [129]: timeit np.array(get_ys(np.arange(5)))
345 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [130]: timeit get_ys1(np.arange(5))
89.2 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
That's what we mean by "vectorization" - replacing python level iterations (a list comprehension) with an equivalent that makes fuller user of numpy
array methods.
I suspect we can move on to work with a (n,5) coefficients
, but this should be enough to get you started.
By broadcasting
the (n,5) against (20,) I can get a function that does not have any python loops:
def get_ys2(coefficients, num_outputs=20, start=0., stop=1.):
function = lambda x, args: args[:,0]*(x-args[:,1])**2 + args[:,2]*(x-args[:,3]) + args[:,4]
xs = np.linspace(start, stop, num=num_outputs, endpoint=True)
ys = function(xs[:,None], coefficients)
return ys.T
And with a (1,5) input:
In [156]: get_ys2(np.arange(5)[None,:])
Out[156]:
array([[-2. , -1.89473684, -1.78947368, -1.68421053, -1.57894737,
-1.47368421, -1.36842105, -1.26315789, -1.15789474, -1.05263158,
-0.94736842, -0.84210526, -0.73684211, -0.63157895, -0.52631579,
-0.42105263, -0.31578947, -0.21052632, -0.10526316, 0. ]])
With your test case:
In [146]: n = 1000
...: xs = np.random.random((n,5))
...: ys = np.apply_along_axis(get_ys, 1, xs)
In [147]: ys.shape
Out[147]: (1000, 20)
Two timings:
In [148]: timeit ys = np.apply_along_axis(get_ys, 1, xs)
...:
106 ms ± 303 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [149]: timeit ys = np.apply_along_axis(get_ys1, 1, xs)
...:
88 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
and testing this
In [150]: ys2 = get_ys2(xs)
In [151]: ys2.shape
Out[151]: (1000, 20)
In [152]: np.allclose(ys, ys2)
Out[152]: True
In [153]: timeit ys2 = get_ys2(xs)
424 µs ± 484 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
It matches values, and improves speed a lot.
In the new function, args
can now be (n,5). And if x
is (20,1), the result is (20,n), which I transpose on the return.
Upvotes: 1