Reputation: 835
I noticed that the following code takes 0.01 seconds which is too much given that I need it to run around a million times times.
mydist = stats.norm(mean, 2)
for x in range(5,30):
probplak.append(mydist.pdf(x))
Is there any other way to get a similar result (a list of values giving a mean and x values away from the mean giving the standard deviation?).
Upvotes: 3
Views: 1666
Reputation: 879113
mydist.pdf
can be passed a sequence of values instead of a single scalar:
probplak = mydist.pdf(range(5, 30))
This produces the result much quicker than using a for-loop.
For example,
import scipy.stats as stats
mean = 15
mydist = stats.norm(mean, 2)
probplak = mydist.pdf(range(5, 30))
# array([7.43359757e-07, 7.99187055e-06, 6.69151129e-05, 4.36341348e-04,
# 2.21592421e-03, 8.76415025e-03, 2.69954833e-02, 6.47587978e-02,
# 1.20985362e-01, 1.76032663e-01, 1.99471140e-01, 1.76032663e-01,
# 1.20985362e-01, 6.47587978e-02, 2.69954833e-02, 8.76415025e-03,
# 2.21592421e-03, 4.36341348e-04, 6.69151129e-05, 7.99187055e-06,
# 7.43359757e-07, 5.38488002e-08, 3.03794142e-09, 1.33477831e-10,
# 4.56736020e-12])
Using IPython to perform a speed comparison:
In [112]: %timeit mydist.pdf(range(5, 30)) # <-- passing sequence
10000 loops, best of 3: 134 µs per loop
In [113]: %timeit [mydist.pdf(x) for x in range(5,30)] # <-- using a loop
100 loops, best of 3: 2.95 ms per loop
Also note that stats.norm.pdf
can be passed array-like arguments not only for x
but also for the loc
(i.e. mean
) and scale
(i.e. standard deviation).
That means that if you need to run this code for a million
different means and scales, then you could compute
all the probplak
s at once with one call to stats.norm.pdf
:
means = [1,2,3]
std_devs = [10,20,30]
probplaks = stats.norm.pdf(np.arange(5, 30)[:,None], means, std_devs)
returns an array of shape (25, 3)
. If you change means
and std_devs
to lists with a million values, then probplaks.shape
would be (25, 10**6)
.
Computing all the probplaks with one call to stats.norm.pdf
will be much faster than a million calls to mydist.pdf
in a Python loop:
In [117]: %timeit stats.norm.pdf(np.arange(5, 30)[:,None], means, scales)
10000 loops, best of 3: 135 µs per loop
In [118]: %timeit [[stats.norm(m, s).pdf(x) for x in range(5,30)] for m,s in zip(means,scales)]
10 loops, best of 3: 64.4 ms per loop
Upvotes: 4