Reputation: 87
I'm new with Python. I have to create a new rv U(t)
. Assuming that Z
has a standard normal distribution c = 1.57
, I have that:
U(t) = 0 if Z(t) <= c
U(t) = Φ(Z(t)) Z(t) > c
Where Φ(·)
is the cdf
of the standard normal distribution N(0, 1)
.
I start sampling random numbers from the normal distribution and I create an array of zeros:
z = np.random.normal(0, 1, 100)
u = np.zeros([1, 100])
Then, I write the following loop:
for i in list(range(1, 101, 1)):
if z[:i] < c:
u[:i] = 0
else z[:i] > c:
u[:i] = norm.cdf(z[:i], loc=0, scale=1)
However, there is something wrong. I got this error:
File "<ipython-input-837-1cbdd0641a75>", line 4
else z[:i] > c:
^
SyntaxError: invalid syntax
Can someone please help me find out where the error is? Or suggest another way to deal with the problem?
Thank you!
Upvotes: 1
Views: 74
Reputation: 25489
Use the power of numpy!
z = np.random.normal(0, 1, 100)
u = np.zeros(z.shape)
Since you initialized u
to zeros, you don't need to do anything for the z <= c
cases. For the others, you can use numpy's logical indexing to only set the elements that fulfill the condition
# Get only the elements of z where z > c
z_filt = z[z > c]
# Calculate the norm.cdf at these values of z,
norm_vals = norm.cdf(z_filt, loc=0, scale=1)
# Assign those to the elements of u where z > c
u[z > c] = norm_vals
Of course, you can condense these three lines down to a single line:
u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
This approach will be significantly faster than iterating over the arrays and setting individual elements.
If you're curious why your code didn't work and how to fix it,
range
to iterate on it. Just for i in range(100)
is good enoughz[:i] < c
will give an array containing i
boolean values. Putting an if
condition on that will give you an error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
. I suspect you meant to do z[i] < c
u[:i] = 0
will set all elements of u
from the start to the i
th index to zero. You probably only wanted u[i] = 0
. This is really not even necessary since you already initialized u
to be zeroselse <condition>
doesn't work. You want elif <condition>
. Although you don't really need that here, since you could just do if z[i] > c
and that's the only condition you need.for i in range(100):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
Comparing the runtimes of these two approaches:
import timeit
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt
def f_numpy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
u[z > c] = norm.cdf(z[z > c], loc=0, scale=1)
return u
def f_loopy(size):
z = np.random.normal(0, 1, size)
u = np.zeros(z.shape)
for i in range(size):
if z[i] > c:
u[i] = norm.cdf(z[i], loc=0, scale=1)
return u
c = 0
sizes = [10, 100, 1000, 10_000, 100_000]
times = np.zeros((len(sizes), 2))
for i, s in enumerate(sizes):
times[i, 0] = timeit.timeit('f_numpy(s)', globals=globals(), number=100) / 100
print(">")
times[i, 1] = timeit.timeit('f_loopy(s)', globals=globals(), number=10) / 10
print(".")
fig, ax = plt.subplots()
ax.plot(sizes, times[:, 0], label="numpy")
ax.plot(sizes, times[:, 1], label="loopy")
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Array size')
ax.set_ylabel('Time per function call (s)')
ax.legend()
fig.tight_layout()
we get the following plot, which shows that the numpy approach is orders of magnitude faster than the loopy approach.
Upvotes: 1