DataStax agent fails to connect to DSE Opscenter 6
I am trying to run a single region cluster, with multi nodes, on DataStax OpsCenter 6.0 on Ec2, but when i add a node it fails to start
In the install job of the node i get an error : dse failed to start
I have 3 nodes on Ec2 in the same region, and i got Ops center running on a 4th Ec2 server.
i am new to cassandra and datastax, and after looking at datastax documentation Snitches it seems my issue is due that my endpoint_snitch is wrong.
My endpoint_snitch is actually set to GossipingPropertyFileSnitch, but OpsCenter does not let me choose another option, Ec2Snitch is not available in endpoint_snitch choices
Do you any idea about the right configuration for Datastax Opscenter 6.0 to run multi nodes properly on Ec2 ?
Edit: it seems that opscenter lcm is working properly but when the agent starts running on a node i get an error:
/var/log/datastax-agent/agent.log
Unable to connect via JMX, target cassandra is likely unavailable or unreachable, please check cassandra health and connection settings jmx_host: 127.0.0.1 jmx_port: 7199 jmx credentials withheld from logging.
Answers (2)
I solved my issue, but i didnt find why dse failed to start when running the agent.
I did find a way to make OpsCenter LCM run & install my single cluster region on ec2. After reading datastax documentation on planning ec2
i used an ec2 AMI from trusted sources instead of the basic ubuntu AMI.
It sounds like you're using the OpsCenter Lifecycle Manager feature to deploy your cluster. I'm an LCM dev. It's hard to tell exactly what's going on from your initial report... but some general thoughts:
- As Chris Lohfink said, don't worry about the snitch. It's not necessary to use EC2 snitch in EC2. GPFS can do everything EC2Snitch can do and more, which is why LCM uses it.
- LCM cannot currently protect you from invalid DSE configs. OPSC-7414 is the internal ticket number that we use to track our plans for improved up-front validation of DSE configs. If you have support, contact them to get your company added to that issue so it gets prioritized more quickly.
- In the meantime, if you use a broken DSE config... DSE will error on startup and you'll have to SSH into the DSE node and look at the DSE logs there to figure out what went wrong, it's not always simple understand, but it's the only way to sort out DSE startup problems.
- If you're new to DSE, the simplest thing to do might be start with fresh target boxes and a fresh config profile and leave the config as default as possible for the initial install. Once you've got your cluster running, you can execute additional configure jobs to change one thing at a time and then when you encounter a problem you'll have a better idea of what setting caused it.
- Also keep your network as simple as possible in the beginning. That means put your targets all in the same subnet together with OpsCenter in a single VPC in a single region. Disable iptables on your nodes before running LCM. Set your security-group to allow all traffic from all members of that subnet (but probably not from the internet, even though that complicated things a bit). Once you have the most simple and permissive network setup possible working, you can expand to more complex network environments confident that any new problems are related to your network-config.
- Messing up the various ip's in the node-form can cause DSE to fail to start as well. If you're using the very simple all-hosts-in-one-subnet network setup I described earlier, use the target's private-ip for the ssh-management-address and leave all the other addresses blank.