Reputation: 73
Say I have 20 processors available. I want to pass arguments to an external
program from IPython that runs best with 4 threads at a time, and use map_async to keep adding jobs until all jobs are finished. Below is example code where I believe just one process would be assigned to each job at a time. Is this an example where you would use the 'chunksize' flag? It seems that would do the opposite, i.e., send multiple jobs to one processor.
ipcluster start -n 20 --daemon
import ipyparallel as ipp
import subprocess
def func(args):
""" function that calls external prog w/ 4 threads """
subprocess.call([some_external_program, args, nthreads=4])
args = [...]
ipyclient = ipp.Client().load_balanced_view()
results = ipyclient.map_async(func, args)
results.get()
Upvotes: 0
Views: 727
Reputation: 38608
If a task is multithreaded, you don't want to be running it on too many engines. If this is the bulk of your work, it is probably best to start n_cpus/n_threads
engines instead of n_cpus
(5 engines in your case of 20 CPUs, 4 threads). If it's a subset of your work that's multithreaded like this, then you may want to just restrict their assignment to n_cpus/n_threads
. You can do this with the targets
argument when creating a view, which will restrict task assignment to a subset of engines:
n_threads = 4
client = ipp.Client()
all_view = client.load_balanced_view() # uses all engines
threaded_view = client.load_balanced_view(targets=client.ids[::n_threads])
This assumes that you have one engine per CPU on a single machine. If you are using multiple machines or the engine count has a different relationship to the number of CPUs, you will have to work out the correct subset of engines to use. Targets can be any manually specified list of engine IDs.
Upvotes: 1