Leonid Mirsky
Leonid Mirsky

Reputation: 831

Turning cassandra inter-node encryption on causes "Unable to gossip with any seeds" exception

I am trying to turn cassandra (2.1) inter-node encryption on. For testing purposes I am trying to start a 2 node cluster.

I am running each node inside a docker container on 2 separate ec2 instances. Without inter-node encryption, everything works as expected.

I am generating the ssl keys using the following script (taken from https://docs.jboss.org/author/display/RHQ/Cassandra+Node+To+Node+Encryption?_sscc=t):

  for ((a=0; a < NUMBER_OF_NODES ; a++))
  do
     node_id=node${a}

     echo -e "Start building certificates for ${node_id}"
     echo -e "=========================================="
     rm -vf ./${node_id}.keystore
     rm -vf ./${node_id}.cer

     #1 Generate key and store
     ${java_folder}/keytool -genkey -v -keyalg RSA -keysize 1024 -alias ${node_id} -keystore ${node_id}.keystore -storepass "${node_id}store" -dname 'CN=RHQ' -keypass "${node_id}store" -validity 3650

     #2 Extract public certificate
     ${java_folder}/keytool -export -v -alias ${node_id} -file ${node_id}.cer -keystore ${node_id}.keystore -storepass "${node_id}store"

     #3 Add public certificate to global keystore
     ${java_folder}/keytool -import -v -trustcacerts -alias ${node_id} -file ${node_id}.cer -keystore global.truststore -storepass 'globalstore' -noprompt

     echo -e "========================================="
     echo -e "Done building certificates for ${node_id}\n\n"
  done

I am also adding the following configuration to each node's cassandra.yml file (node0 changes accordingly):

server_encryption_options:
   internode_encryption: all
   keystore: /keystores/node0.keystore
   keystore_password: node0store
   truststore: /keystores/global.truststore
   truststore_password: globalstore

node1 is configured with node0 as it's seed. I start node0, and wait until it starts, I see no exceptions, everything works as expected. Then I start node1, which throws the following (only when the debug level is set to "trace"):

TRACE 08:14:16 unable to connect to 172.12.1.11/172.12.1.11
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
        at sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:671) ~[na:1.7.0_65]
        at sun.security.ssl.InputRecord.read(InputRecord.java:504) ~[na:1.7.0_65]
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) ~[na:1.7.0_65]
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312) ~[na:1.7.0_65]
        at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:702) ~[na:1.7.0_65]
        at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122) ~[na:1.7.0_65]
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_65]
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_65]
        at org.apache.cassandra.io.util.DataOutputStreamPlus.flush(DataOutputStreamPlus.java:55) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:347) [apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:163) [apache-cassandra-2.1.1.jar:2.1.1]
TRACE 08:14:17 Expired 0 entries
TRACE 08:14:20 Expired 0 entries
TRACE 08:14:22 Expired 0 entries
TRACE 08:14:25 Expired 0 entries
TRACE 08:14:27 Expired 0 entries
TRACE 08:14:30 Expired 0 entries
TRACE 08:14:32 Expired 0 entries
DEBUG 08:14:34 Copy GC in 14ms.  CMS Old Gen: 9537256 -> 14901648; Eden Space: 41943040 -> 0; Survivor Space: 5242872 -> 5242880
TRACE 08:14:35 Expired 0 entries
ERROR 08:14:37 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1221) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:700) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:637) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:529) ~[apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:324) [apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:443) [apache-cassandra-2.1.1.jar:2.1.1]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:532) [apache-cassandra-2.1.1.jar:2.1.1]
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1221)
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:700)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:637)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:529)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:324)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:443)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:532)
Exception encountered during startup: Unable to gossip with any seeds

It is also worth noting that on node0 port 7001 is open and accessible by node1.

Upvotes: 2

Views: 620

Answers (1)

Leonid Mirsky
Leonid Mirsky

Reputation: 831

As usually the case, the problem was related to the environment configuration and not to the actual cassandra settings.

I am running cassandra instances isolated inside a docker containers on a coreos cluster. I forgot that the default etcd ssl port and cassandra's default ssl inter-node communication port are both 7001.

When changing one of the systems to work with an alternative port number the issue was resolved. I think that the error message could be more clear (and won't require trace debug level). A clearer error message could save me some time from tracing the network packets for answers.

Upvotes: 3

Related Questions