Starting spinnaker in own kubernetes cluster hangs at data-cassandra-keys?

Question

I want to deploy spinnaker into our company kubernetes cluster. I've cloned the master branch of spinnaker and modified the spinnaker-local files accordingly. We use an internal docker registry so I copied the required images from dockerhub into our own one.

When I start the "startup-all.sh" script it seems the redis images fire up. Then it tries to start the cassandra databases. Creating the Job "data-cassandra-keys" does not finish. It creates new pods over and over again.

I tried to get some logs using:

c:\spinnaker\experimental\kubernetes>kubectl log data-cassandra-keys-w4yil
W0601 16:20:02.457396   10336 cmd.go:207] log is DEPRECATED and will be removed in a future version. Use logs instead.
Connecting to...
172.23.77.106
9042
Connection error: ('Unable to connect to any servers', {'172.23.77.106': error(None, "Tried connecting to [('172.23.77.106', 9042)]. Last error: timed out")})
Failed to add keyspace create_echo_keyspace.cql

There are dozens of data-cassandra-keys-xxxxx - all show the same and kubernetes keeps creating new ones.

The startup scripts are stuck at:

SUCCESS=$(kubectl get job data-cassandra-keys --namespace=spinnaker -o=jsonpath="{.status.succeeded}")

while [ $SUCCESS -ne "1" ]; do
    SUCCESS=$(kubectl get job data-cassandra-keys --namespace=spinnaker -o=jsonpath="{.status.succeeded}")
done

I can't figure out what setting I have to change to make this work (how cassandra knows what host to connect). I also don't really understand why the "data-cassandra-keys" job is recreated over and over again.

The events are full of:

6m        6m        1         data-cassandra-keys-sxp3o   Pod                                         Normal    Scheduled          {default-scheduler }           Successfully assigned data-cassandra-keys-sxp3o to ld9c0193.corp.test
6m        6m        1         data-cassandra-keys-sxp3o   Pod       spec.containers{cassandra-keys}   Normal    Pulled             {kubelet ld9c0193.corp.test}   Container image "docker-registry.corp.ch/kubernetes-spinnaker/cassandra-keys:v2" already present on machine
6m        6m        1         data-cassandra-keys-sxp3o   Pod       spec.containers{cassandra-keys}   Normal    Created            {kubelet ld9c0193.corp.test}   Created container with docker id 46de7bd5f425
6m        6m        1         data-cassandra-keys-sxp3o   Pod       spec.containers{cassandra-keys}   Normal    Started            {kubelet ld9c0193.corp.test}   Started container with docker id 46de7bd5f425

Any hint on whats going on or where to look at is appreciated :)

Thanks!

lwander · Accepted Answer

The "data-cassandra-keys" pod gets recreated by the Kubernetes Job controller, but should probably have its restart policy changed to avoid creating too many dead pods. It retries for some time because Cassandra can take a while to startup, and therefore shouldn't fail after the first attempt to create the keyspaces.

A known issue is that Cassandra fails to start correctly when configured to use persistent disks due to a permissions issue. Have you checked the logs for the Cassandra pod? data-cassandra-v000-xxxx

Starting spinnaker in own kubernetes cluster hangs at data-cassandra-keys?

Answers (1)

Related Questions