Reputation: 1442
I want to dynamically start clusters from my Jupyter notebook for specific functions. While I can start the cluster and get the engines running, I am having two issues:
(1) I am unable to run the ipcluster command in the background. When I run the command through notebook, the cell is running till the the time the clusters are running, i.e. I can't run further cells in the same notebook. I can use the engines once they are fired in a different notebook. How can I run ipcluster in the background?
(2) My code is always starting 8 engines, regardless of the setting in ipcluster_config.py.
Code:
server_num = 3
ip_new = '10.0.1.' + str(10+server_num)
cluster_profile = "ssh" + str(server_num)
import commands
import time
commands.getstatusoutput("ipython profile create --parallel --profile=" + cluster_profile)
text = """
c.SSHEngineSetLauncher.engines = {'""" +ip_new + """' : 12}
c.LocalControllerLauncher.controller_args = ["--ip=10.0.1.163"]
c.SSHEngineSetLauncher.engine_cmd = ['/home/ubuntu/anaconda2/pkgs/ipyparallel-6.0.2-py27_0/bin/ipengine']
"""
with open("/home/ubuntu/.ipython/profile_" + cluster_profile + "/ipcluster_config.py", "a") as myfile:
myfile.write(text)
result = commands.getstatusoutput("(ipcluster start --profile='"+ cluster_profile+"' &)")
time.sleep(120)
print(result[1])
Upvotes: 4
Views: 2284
Reputation: 385
When I saw your answer unanswered on StackOverflow, I almost had a heart attack because I had the same problem.
But running the
ipcluster start --help
command showed this:
--daemonize
This makes it run in the background.
So in your notebook you can do this:
no_engines = 6
!ipcluster start -n {no_engines} --daemonize
Note: This does not work on Windows according to
ipcluster start --help
Upvotes: 3
Reputation: 226
I am not familiar with the details of the commands
module (it's been deprecated since 2.6, according to https://docs.python.org/2/library/commands.html) but I know that with the subprocess
module capturing output will make the make the interpreter block until the system call completes.
Also, the number of engines can be set from the command line if you're using the ipcluster
command, even without adjusting the configuration files. So, something like this worked for me:
from ipyparallel import Client
import subprocess
nengines = 3 # or whatever
subprocess.Popen(["ipcluster", "start", "-n={:d}".format(nengines)])
rc = Client()
# send your jobs to the engines; when done do
subprocess.Popen(["ipcluster", "stop"])
This doesn't, of course, address the issue of adding or removing hosts dynamically (which from your code it looks like you may be trying to do), but if you only care how many hosts are available, and not which ones, you can make a default ipcluster configuration which includes all of the possible hosts, and allocate them as needed via code similar to the above.
Note also that it can take a second or two for ipcluster to spin up, so you may want to add a time.sleep
call between your first subprocess.Popen
call and trying to spawn the client.
Upvotes: 0