Reputation: 4264
I am writing a simulation of cancer cell population growth, and I am using numpy.random functions to model the cells acquiring or losing mutations. I have determined through profiling that the bottleneck in the code (around 70% of runtime) is the first few lines that contain the numpy.random functions. Here the variable num_steps
is a large number, around one million:
def simulate(mu, gamma, beta, num_steps, threshold):
mutation_num = 0 # the index of the mutation (we assume each mutation only occurs once)
population = {() : 1} # represents population: tuple of mutations and number of cells with those mutations
for epoch in range(num_steps):
next_population = {}
for mutations, size in population.items():
born = np.random.binomial(size, birth_rate)
if np.random.binomial(born, gamma):
return True
mut_loss = 0 # initializing in case variable is not created
if mutations:
mut_gain, mut_loss, mut_same = np.random.multinomial(born, [mu, beta, 1-mu-beta])
else:
mut_gain, mut_same = np.random.multinomial(born, [mu, 1-mu])
.....
Is there a way to make the np.random.binomial
and np.random.multinomial
functions run faster? I tried using Cython, but that did not help.
Upvotes: 0
Views: 1093
Reputation: 231325
To illustrate my comment:
In [81]: timeit np.random.binomial(1,1,1000)
46.4 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [82]: %%timeit
...: for _ in range(1000):
...: np.random.binomial(1,1)
...:
4.77 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If possible generate many random values with one call rather than one at a time.
Upvotes: 2