wiwa1978
wiwa1978

Reputation: 2687

Why do I get "disconnected framework"?

I have installed and configured Mesos and Marathon. Whenever I try to schedule an application, it remains in 'Waiting' state which seems to indicate that Marathon is waiting for offers from Mesos.

When I check the logs in Mesos, I see the following:

I0425 20:22:10.313910  4279 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at [email protected]:50892
I0425 20:22:10.313987  4279 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [  ]
I0425 20:22:10.313994  4279 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at [email protected]:50892 already subscribed, resending acknowledgement
W0425 20:22:10.314007  4279 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at [email protected]:50892
E0425 20:22:10.314193  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:11.226884  4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at [email protected]:35928
I0425 20:22:11.226959  4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:11.226969  4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928 already subscribed, resending acknowledgement
W0425 20:22:11.226982  4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928
E0425 20:22:11.227226  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:12.113598  4281 http.cpp:312] HTTP GET for /master/state from 192.0.2.1:49698 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
I0425 20:22:12.314221  4286 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at [email protected]:50892
I0425 20:22:12.314304  4286 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [  ]
I0425 20:22:12.314312  4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at [email protected]:50892 already subscribed, resending acknowledgement
W0425 20:22:12.314337  4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0001 (chronos-2.4.0) at [email protected]:50892
E0425 20:22:12.314524  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.081887  4284 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at [email protected]:35928
I0425 20:22:13.081964  4284 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:13.081987  4284 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928 already subscribed, resending acknowledgement
W0425 20:22:13.082005  4284 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928
E0425 20:22:13.082314  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:13.221590  4282 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at [email protected]:35928
I0425 20:22:13.221664  4282 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:13.221674  4282 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928 already subscribed, resending acknowledgement
W0425 20:22:13.221688  4282 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928
E0425 20:22:13.222162  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected
I0425 20:22:14.412215  4286 master.cpp:2231] Received SUBSCRIBE call for framework 'marathon' at [email protected]:35928
I0425 20:22:14.412281  4286 master.cpp:2302] Subscribing framework marathon with checkpointing enabled and capabilities [  ]
I0425 20:22:14.412289  4286 master.cpp:2312] Framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928 already subscribed, resending acknowledgement
W0425 20:22:14.412302  4286 master.hpp:1764] Master attempted to send message to disconnected framework c16a5bfb-838e-4d43-bf3c-21bf94358ab5-0000 (marathon) at [email protected]:35928
E0425 20:22:14.412495  4287 process.cpp:1958] Failed to shutdown socket with fd 39: Transport endpoint is not connected

Any idea as to why it mentions a 'disconnected' framework. In Mesos, I can see the 3 slaves and the Marathon (and Chronos) framework are mentioned in the 'active frameworks'.

The /etc/hosts mention the following entries:

192.0.2.11  master1  # VAGRANT: cd38e81ab8742b23dfbcb913468368ea (master1) / 1b611425-dbad-4bd0-8727-4169c09ec045
192.0.2.51  slave1  # VAGRANT: 94630539b67d178dddffda29a0313a75 (slave1) / 1a1694de-2bd2-4d96-bdf2-dd6767d1f310
192.0.2.52  slave2  # VAGRANT: 306e67b33b327b3d1c9990bf1316a321 (slave2) / bdbd677e-5298-4d49-90a8-e521139dd127
192.0.2.12  master2  # VAGRANT: fb338e9e9c001a5bfab605387ba88d02 (master2) / bdccfd80-b1e6-48a0-8986-b24c7cbd7a25
192.0.2.53  slave3  # VAGRANT: 3913b3358eadc90c622859ddb90bfede (slave3) / 786cbe69-2af5-43b7-8e70-d6cc07d4ddf4
192.0.2.13  master3  # VAGRANT: 92cdd6e36a6c0391e2a66f73661e56fe (master3) / 03bb2c16-f474-4412-b8f4-fce82e12955c

Note: in case more info is needed on how the cluster was installed, please refer to this

Upvotes: 2

Views: 1077

Answers (2)

TooAngel
TooAngel

Reputation: 883

You can also set LIBPROCESS_IP as environment variable. I think this is better than changing the /etc/hosts.

Found the solution here: https://groups.google.com/forum/#!topic/marathon-framework/1qboeZTOLU4

Upvotes: 2

Tobi
Tobi

Reputation: 31479

I guess you need to make sure that the hostnames are resolvable to actual IP addresses.

That's at least what fixed my problems when Marathon etc. tried to bind to 127.0.1.1 on Ubuntu. I.e. you should add on each host the IP to hostname mappings, e.g.

192.0.2.11 master1

entry in the /etc/hosts file either before the mapping of the 127.0.1.1 to the hostname, or remove the 127.0.1.1 entry entirely. The Vagrant plugin vagrant-hostsupdater might help.

Upvotes: 1

Related Questions