RRC
RRC

Reputation: 1442

Starting ipcluster from code

I want to dynamically start clusters from my Jupyter notebook for specific functions. While I can start the cluster and get the engines running, I am having two issues:

(1) I am unable to run the ipcluster command in the background. When I run the command through notebook, the cell is running till the the time the clusters are running, i.e. I can't run further cells in the same notebook. I can use the engines once they are fired in a different notebook. How can I run ipcluster in the background?

(2) My code is always starting 8 engines, regardless of the setting in ipcluster_config.py.

Code:

server_num = 3
ip_new = '10.0.1.' + str(10+server_num)
cluster_profile = "ssh" + str(server_num)

import commands
import time
commands.getstatusoutput("ipython profile create --parallel --profile=" + cluster_profile)

text = """
c.SSHEngineSetLauncher.engines = {'""" +ip_new + """' : 12}
c.LocalControllerLauncher.controller_args = ["--ip=10.0.1.163"]
c.SSHEngineSetLauncher.engine_cmd = ['/home/ubuntu/anaconda2/pkgs/ipyparallel-6.0.2-py27_0/bin/ipengine']
"""

with open("/home/ubuntu/.ipython/profile_" + cluster_profile + "/ipcluster_config.py", "a") as myfile:
    myfile.write(text)

result = commands.getstatusoutput("(ipcluster start --profile='"+ cluster_profile+"' &)")
time.sleep(120)
print(result[1])

Upvotes: 4

Views: 2284

Answers (2)

Ananay Gupta
Ananay Gupta

Reputation: 385

When I saw your answer unanswered on StackOverflow, I almost had a heart attack because I had the same problem.

But running the

ipcluster start --help 

command showed this:

--daemonize

This makes it run in the background.

So in your notebook you can do this:

no_engines = 6
!ipcluster start -n {no_engines} --daemonize

Note: This does not work on Windows according to

ipcluster start --help

Upvotes: 3

tachycline
tachycline

Reputation: 226

I am not familiar with the details of the commands module (it's been deprecated since 2.6, according to https://docs.python.org/2/library/commands.html) but I know that with the subprocess module capturing output will make the make the interpreter block until the system call completes.

Also, the number of engines can be set from the command line if you're using the ipcluster command, even without adjusting the configuration files. So, something like this worked for me:

from ipyparallel import Client
import subprocess

nengines = 3 # or whatever
subprocess.Popen(["ipcluster", "start", "-n={:d}".format(nengines)])
rc = Client()
# send your jobs to the engines; when done do
subprocess.Popen(["ipcluster", "stop"])

This doesn't, of course, address the issue of adding or removing hosts dynamically (which from your code it looks like you may be trying to do), but if you only care how many hosts are available, and not which ones, you can make a default ipcluster configuration which includes all of the possible hosts, and allocate them as needed via code similar to the above.

Note also that it can take a second or two for ipcluster to spin up, so you may want to add a time.sleep call between your first subprocess.Popen call and trying to spawn the client.

Upvotes: 0

Related Questions