Narendra
Narendra

Reputation: 151

does mesos cluster unacceesable when mesos master and agent goes down at same time?

I'm trying to achieve HA with three machines and having masters & slaves like below. I'm using VM's for local test setup and my observations are below.

Case 1:

m1 -> leader master

m2 -> non-leader master, slave1

m3 -> non-leader master, slave2

Case2:

m1->non-leader

m2->leader,slave1,

m3->non-leader,slave2

Apologies for trying HA with only 3 machines and lengthy problem explanation.

Questions :

Masters :

m1 : mesos-master --ip=192.168.1.36 --hostname=192.168.1.36 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

m2 : mesos-master --ip=192.168.1.42 --hostname=192.168.1.42 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

m3 : mesos-master --ip=192.168.1.45 --hostname=192.168.1.45 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

Slaves :

m2 : mesos-slave --ip=192.168.1.42 --hostname=192.168.1.42 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker

m3 : mesos-slave --ip=192.168.1.45 --hostname=192.168.1.45 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker

Zookeeper Config :

tickTime=2000

initLimit=10

syncLimit=5

dataDir=/opt/ncms/zkWorkDir

clientPort=2181

server.1=192.168.1.42:2888:3888 server.3=192.168.1.36:2888:3888

server.5=192.168.1.45:2888:3888

Setup :

Host: Windows 7 (64GB RAM, 24 Cores )

Virtual Box : each vm(m1, m2, m3) has 2 cores and 2 GB RAM with RHEL 7.2

Upvotes: 0

Views: 203

Answers (1)

rukletsov
rukletsov

Reputation: 1051

In scenarios you describe, the number of active masters falls below quorum, which is 2 in your case. This is considered an exceptional situation and certain operations will not succeed, for example, any operation modifying the distributed registry.

Upvotes: 0

Related Questions