kostja
kostja

Reputation: 61548

Meld error with Datastax Enterprise

Provisioning a DSE cluster with the lifecycle manager fails consitently. Master node (also the one OpsCenter is running on) installed correctly. Each one of the other nodes fails the install (also config) task. Have double-checked the SSH credentials and ports. Any ideas on how to investigate further and fix the issue would be great.

Please excuse the length - trying to provide all of the relevant info.

Ubuntu 14.04.4, JRE: 1.8.0.91, DSE 5.0.0

job events:

   ...
    "results": [
        {
            "event-subtype": "start",
            "event-type": "milestone",
            "message": "job started...",
            ...
        },
        {
            "event-subtype": "invocation",
            "event-type": "shell-command",
            "message": "Invoked command: if [ -x $(which yum) ] && [ -f /etc/redhat-release -o -f /etc/SuSE-release ]; then echo -n yum; elif [ -x $(which apt-get) ]; then echo -n apt; fi"
            ...
        },
        {
            "event-subtype": "uploaded-facts",
            "event-type": "milestone",
            "message": "Uploaded facts to OpsCenter server",
            ...
        },
        {
            "event-subtype": "meld-error",
            "event-type": "error",
            "message": "Unexpected error executing meld",
            ...
        },
        {
            "event-subtype": "MeldError",
            "event-type": "error",
            "message": "Meld failed on: name=\"NODE-2\" ssh-management-address=\"<IP>\" node-id=\"<node-id>\" job-id=\"<job-id>\" stdout=\"\r\n\" stderr=\"\"",
            ...
        }
    ]

opscenterd.log

/var/log/opscenter/opscenterd.log-2016-07-02 16:34:16,848 [opscenterd]  INFO: Install job started for node name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" (async-thread-macro-53)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:16,850 [opscenterd]  INFO: using ssh-private-key (async-thread-macro-53)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:18,478 [opscenterd]  INFO: Received milestone from node name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" message="Uploaded facts to OpsCenter server" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" (MainThread)
/var/log/opscenter/opscenterd.log:2016-07-02 16:34:18,675 [opscenterd] ERROR: Received error from node event-subtype="meld-error" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" name="NODE-2" traceback="Traceback (most recent call last):
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 3313, in run
/var/log/opscenter/opscenterd.log-    rc = engine.go()
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 2991, in go
/var/log/opscenter/opscenterd.log-    self.file_manager.get_config_files()
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 1280, in get_config_files
/var/log/opscenter/opscenterd.log-    {\"accept\": \"application/json\"})
/var/log/opscenter/opscenterd.log:  File \"meld.py\", line 598, in get
/var/log/opscenter/opscenterd.log-    return json.loads(response.read())
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/socket.py\", line 351, in read
/var/log/opscenter/opscenterd.log-    data = self._sock.recv(rbufsize)
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 549, in read
/var/log/opscenter/opscenterd.log-    return self._read_chunked(amt)
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 609, in _read_chunked
/var/log/opscenter/opscenterd.log-    value.append(self._safe_read(amt))
/var/log/opscenter/opscenterd.log-  File \"/usr/lib/python2.7/httplib.py\", line 666, in _safe_read
/var/log/opscenter/opscenterd.log-    raise IncompleteRead(''.join(s), amt)
/var/log/opscenter/opscenterd.log:IncompleteRead: IncompleteRead(4153 bytes read, 4039 more expected)" ssh-management-address="<IP>" node-id="<node-id>" event-type="error" message="Unexpected error executing meld" (MainThread)
/var/log/opscenter/opscenterd.log-2016-07-02 16:34:18,892 [opscenterd] ERROR: Install job a630c081-6ac1-4b00-ac08-18fef320e0d5 failed! (async-thread-macro-54)
/var/log/opscenter/opscenterd.log:2016-07-02 16:34:19,105 [opscenterd] ERROR: Meld failed on: name="NODE-2" ssh-management-address="<IP>" node-id="<node-id>" job-id="a630c081-6ac1-4b00-ac08-18fef320e0d5" stdout="
/var/log/opscenter/opscenterd.log-" stderr="" (async-thread-macro-53)

Thank you

EDIT: Captured the HTTP traffic between NODE2 and master. The error occurs while transferring config files. One of them is not transferred completely for some reason. The json looks resonable until some gibberish appears.

 {"filename": "dse.yaml", "contents": {"internode_messaging_options": {"client_worker_threads": 16, "port": 8609, "server_worker_threads": 16, "server_acceptor_thread

Yvatv+~UK{.kMI4^QOrqQTDX_3"DPm,v!"H&M$!1M7

LRYCs{l>-df;cj

W6C9dq

The config files are valid and do work on the master node. Only the replication fails.

Upvotes: 2

Views: 2189

Answers (2)

Lewisr650
Lewisr650

Reputation: 1

You can specify the private IP for Listen Address and 0.0.0.0 for broadcast address and LCM should be able to provision appropriately.

Upvotes: 0

Mike Lococo
Mike Lococo

Reputation: 684

OpsCenter LCM developer here. Your issue is caused by OPSC-8851 in the LCM known issues list: http://docs.datastax.com/en/opscenter/6.0/opsc/release_notes/opscReleaseNotes600.html

This is only triggered under certain network conditions and was discovered too close to release to get fixed in 6.0.0. It's a high priority though, and will be fixed in a subsequent release soon. Unfortunately, I don't think there's anything you can do to work around this in the field. If you're a DataStax customer, you could contact support and potentially get a patch now to workaround the issue... otherwise the only thing I can suggest is to watch the upcoming release notes.

Edit: I should also note that in our tests the issue is intermittent. LCM is designed so you can rerun failed jobs safely (aka it's idempotent) so in all but the most extreme cases you can also work around this just by rerunning your job.

Upvotes: 1

Related Questions