Reputation: 11
I created a quick monte-carlo simulation which seems to do what I want (simple version below).
import numpy as np
import os
import scipy.stats as sp
sigma = 0.5
u = 10
mu = 100
no_sim = int(10e3)
no_col = 2
mat_res = np.zeros((no_sim,no_col)) #results matrix holder
lim = 40e3 # parameter 1 - needs to vary
xs = 2000 # parameter 2 - needs to vary
for i in range(1,no_sim):
no_clm = sp.poisson.rvs(mu=mu,size=1)
clm_sev = sp.lognorm.rvs(s=sigma,scale=np.exp(u),size=no_clm)
temp_mat = np.zeros((np.size(clm_sev),2)) # temp matrix for calculations
temp_mat[:,0] = clm_sev
temp_mat[:,1] = np.minimum(lim,np.maximum(0,temp_mat[:,0]-xs)) #want to expand this step for various values of lim and xs, e.g. 5 different options
mat_res[i,0] = np.sum(temp_mat[:,0])
mat_res[i,1] = np.sum(temp_mat[:,1])
#want these mean values for various different lim and xs values which are predefined
print("mean 1 is %s" % np.mean(mat_res[:,0]),
"mean 2 is %s" % np.mean(mat_res[:,1]))
I am trying to do two things:
Speed up the efficiency of the code, I will need to run over a million simulations in reality so want to do this in the best way possible - currently it takes a five minutes or so to run a million simulations
expand the code in an efficient way so that I can vary the parameters lim
and xs
and get a new result - i.e. right now I have lim = 40e3
and xs = 2000
but I would also want to run it with say lim = 50e3
and xs = 1000
and then return back the mean value in the print (along with the original mean for the original lim
and xs
parameters). One solution is to wrap the for-loop into a function with the parameters I require, however as I only use the lim
and xs
parameters in one line in the monte carlo simulation I don't think it's efficient to run the whole simulation again from scratch, but I can't think of any good way to build it into the for-loop as it stands.
Upvotes: 0
Views: 1754
Reputation: 25180
I found that you can get a 20x speedup by using vectorization and doing more of the operations in numpy.
I did need to make some slight changes to the simulation itself, though. Specifically, instead of sampling no_clm every time through the loop, I sample it once per "batch," and use the same value for the entire batch.
Once I make this simplification, it's possible to create a matrix of log-norm sampled values, and apply the clipping and summation to each row of that matrix. This means that more of the calculation happens in numpy.
Here's the code:
def sim(xs, lim, no_sim):
mat_res = np.zeros((no_sim,no_col)) #results matrix holder
batch_size = 100
for i in range(0, no_sim, batch_size):
if i + batch_size >= no_sim:
batch_size = no_sim - i
no_clm = sp.poisson.rvs(mu=mu,size=1)[0]
clm_sev = sp.lognorm.rvs(s=sigma,scale=np.exp(u),size=(batch_size, no_clm))
mat_res[i:i+batch_size,0] = np.sum(clm_sev, axis=1)
mat_res[i:i+batch_size,1] = np.sum(np.clip(clm_sev - xs, 0, lim), axis=1)
return mat_res
Benchmark results:
Method | Timing |
---|---|
No vectorization | 1.380s per 10k loops |
Batch size 10 | 0.176s per 10k loops |
Batch size 100 | 0.051s per 10k loops |
Batch size 1000 | 0.036s per 10k loops |
Increasing batch size will have the benefit of increasing the speed, at the cost of re-using the same no_clm value multiple times. I think a sweet spot for this is around 100.
Because this method only uses one core, this speedup will stack with the parallelization suggestion that @pinxau1000 made.
Upvotes: 0
Reputation: 301
You can use multiprocessing library to paralelize your code. If you can work only with numpy's arrays and operations to process your data you can also use numpy's vectorize. As Glauco said you can also optimize your code.
I made a snippet that use multiprocessing. This is not at all optimised, is just a draw for your reference, however in my computer exec time improved roughly 2.9x.
import time
import multiprocessing
from functools import partial
import tqdm
import numpy as np
import scipy.stats as sp
def simulation_0(mu: int, sigma: int, scale: float = np.exp(1)):
no_clm = sp.poisson.rvs(mu=mu, size=1)
# Your scale was np.exp(u), I set it to be a parameter for example purposes.
return sp.lognorm.rvs(s=sigma, scale=scale, size=no_clm)
def simulation_1(xs: int, lim: int, mu: int, sigma: int, scale: float = np.exp(1)):
# want to expand this step for various values of lim and xs, e.g. 5 different options
return np.minimum(lim, np.maximum(0, simulation_0(mu=mu, sigma=sigma, scale=scale) - xs))
def simulation(xs: int, lim: int, mu: int, sigma: int, scale: float = np.exp(1)):
with open("inputs.csv", "a") as fwriter:
fwriter.write(f"{xs}, {lim}\n")
sim_0_res = simulation_0(mu=mu, sigma=sigma, scale=scale)
with open("simulation_0.csv", "a") as fwriter:
fwriter.write(f"{sim_0_res},")
sim_1_res = simulation_1(xs=xs, lim=lim, mu=mu, sigma=sigma, scale=scale)
with open("simulation_1.csv", "a") as fwriter:
fwriter.write(f"{sim_1_res},")
def simulation_wrapper(xs_lim_tuple: tuple, mu: int, sigma: int, scale: float = np.exp(1)):
return simulation(xs=xs_lim_tuple[0], lim=xs_lim_tuple[1], mu=mu, sigma=sigma, scale=scale)
def main():
# Constants
mu = 0.5
sigma = 0.5
# The parameters to sweep
lim = np.linspace(30e3, 50e3, 100) # parameter 1 - needs to vary
xs = np.linspace(1000, 3000, 100) # parameter 2 - needs to vary
# Creates a matrix with all the combinations
sweep_parameters = np.array(np.meshgrid(xs, lim)).T.reshape((np.size(lim) * np.size(xs), 2))
# Create a pool with N threads (N is set to be the number of threads)
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
# This allows us to "initialize" the function call, since the mu and sigma will not change throughout the
# execution we can already set these parameters.
partial_function = partial(simulation_wrapper, mu=mu, sigma=sigma)
# TQDM is only used to display a progressbar
for _ in tqdm.tqdm(pool.imap(partial_function, sweep_parameters), total=np.shape(sweep_parameters)[0]):
pass
def main_no_parallel():
# Constants
mu = 0.5
sigma = 0.5
# The parameters to sweep
lim = np.linspace(30e3, 50e3, 100) # parameter 1 - needs to vary
xs = np.linspace(1000, 3000, 100) # parameter 2 - needs to vary
# Creates a matrix with all the combinations
sweep_parameters = np.array(np.meshgrid(xs, lim)).T.reshape((np.size(lim) * np.size(xs), 2))
# This allows us to "initialize" the function call, since the mu and sigma will not change throughout the execution we
# can already set these parameters.
partial_function = partial(simulation_wrapper, mu=mu, sigma=sigma)
# TQDM is only used to display a progressbar
for sweep_parameter in tqdm.tqdm(sweep_parameters, total=np.shape(sweep_parameters)[0]):
partial_function(sweep_parameter)
if __name__ == "__main__":
exec_times = []
for i in range(5):
start_time = time.time()
main()
exec_times.append(time.time() - start_time)
print(f"--- {exec_times[-1]} seconds ---")
print(f"\nParallelization Avg Exec Time: {np.average(exec_times)}\n")
exec_times = []
for i in range(5):
start_time = time.time()
main_no_parallel()
exec_times.append(time.time() - start_time)
print(f"--- {exec_times[-1]} seconds ---")
print(f"\nNo-Parallelization Avg Exec Time: {np.average(exec_times)}\n")
Output:
100%|██████████| 10000/10000 [00:01<00:00, 6492.83it/s]
--- 1.5754084587097168 seconds ---
100%|██████████| 10000/10000 [00:01<00:00, 6395.57it/s]
--- 1.585716724395752 seconds ---
100%|██████████| 10000/10000 [00:01<00:00, 6657.03it/s]
--- 1.5238027572631836 seconds ---
100%|██████████| 10000/10000 [00:01<00:00, 6705.56it/s]
--- 1.5113093852996826 seconds ---
100%|██████████| 10000/10000 [00:01<00:00, 6691.30it/s]
--- 1.5149972438812256 seconds ---
Parallelization Avg Exec Time: 1.5422469139099122
100%|██████████| 10000/10000 [00:04<00:00, 2216.83it/s]
--- 4.511844873428345 seconds ---
100%|██████████| 10000/10000 [00:04<00:00, 2202.73it/s]
--- 4.540390491485596 seconds ---
100%|██████████| 10000/10000 [00:04<00:00, 2247.02it/s]
--- 4.450916051864624 seconds ---
100%|██████████| 10000/10000 [00:04<00:00, 2237.04it/s]
--- 4.470769166946411 seconds ---
100%|██████████| 10000/10000 [00:04<00:00, 2215.68it/s]
--- 4.513864517211914 seconds ---
No-Parallelization Avg Exec Time: 4.497557020187378
Upvotes: 1
Reputation: 1465
Fortunately montecarlo is a trivial problem to parallelize, but i suggest to you to proceed in 2 steps:
optimization You can do benchmark in order to find wich statements are the slowest and try to refactor in an efficient way: there are some profilers to do these kind of stuff (i also wrote perf tool for a similar problem)
parallelize computation Using library like ray joblib or celery you will find a lot of benefits in performance, you should split your computation problem in order to create a bunch of tasks, for example if you works on a single machine with 8CPU, you can reduce the montecarlo in a sum of 8 small montecarlo.
Upvotes: 0